FRosner / drunken-data-quality

Spark package for checking data quality
Apache License 2.0
222 stars 69 forks source link

Column instance with the python app #116

Closed Lucindo06 closed 7 years ago

Lucindo06 commented 7 years ago

Column instance with the python version is not working. When I try to execute this code :

%pyspark from pyddq.core import Check from pyddq.reporters import ZeppelinReporter

df = sqlContext.createDataFrame([(1, "a"), (1, None), (3, "c")]) check = Check(df) reporter = ZeppelinReporter(z) check.satisfies(df._1 > 0).run([reporter])

I get : TypeError: 'Column' object is not callable

FRosner commented 7 years ago

@Gerrrr any thoughts? What was the reason why we didn't implement?

Gerrrr commented 7 years ago

Originally I did not figure how to convert pyspark.sql.Column into scala's column. It is implemented in https://github.com/FRosner/drunken-data-quality/pull/118 now.