Closed MOHACGCG closed 3 years ago
code to generate the issue:
schema = StructType().add(StructField("numeric_column", LongType()))
df = spark.createDataFrame([{"numeric_column": None}], schema=schema)
ColumnProfilerRunner(spark).onData(df).withKLLProfiling().run()
Describe the bug When KLL is enabled for profiling to calculate percentile quantiles, if a column is all null values (completeness = 0), the conversion of the quantile percentiles fails from java [''] to python list in the java_list_to_python_list function in scala_utils.
To Reproduce Steps to reproduce the behavior:
pydeequ/scala_utils.py", line 101, in <listcomp> vals = [datatype(i) for i in java_list[start+1:end].split(',')] ValueError: could not convert string to float:
Expected behavior Profile should be calculated and percentiles should be None
Proposed Change Change the behavior so that empty values are handled as None except for string values. https://github.com/awslabs/python-deequ/pull/11