Open ashildkummen opened 1 year ago
I'm seeing the same error. I tried using a UDF as well, e.g.
from pyspark.sql.functions import udf
binningUdf = udf(lambda z: int(z), returnType=IntegerType())
Same error. Maybe something to do with using functions in general....
In any case, the workaround I'm going to use is to simply apply the UDF ahead of the Histogram method and apply the histogram to the dummy column.
df.withColumn("dummy", binningUdf(df['Column']))
Looks like a bug - @ashildkummen does vishaalkapoor's workaround work for you?
I am not able to use the
binningUdf
parameter of the Histogram analyzer, it errors when performing this line, getting error message:I have tried using a simple lambda function that actually does no binning but returns its input as output:
To Reproduce Steps to reproduce the behavior:
.addAnalyzer(Histogram("star_rating", binningUdf=lambda x: x))
Expected behavior I would expect it to work just as it works when I'm doing it without binningUdf (i.e. just
.addAnalyzer(Histogram("star_rating"))
) Some more documentation on how to use the binningUdf parameter would be great.