Closed Ashokgoa closed 7 months ago
Related to https://github.com/awslabs/python-deequ/issues/55. Pending research
See #169 for workaround
This will be resolved in next release which will include https://github.com/awslabs/python-deequ/issues/169.
Describe the bug When passing a lambda function for an assertion for hasSize, hasMin, or hasMax, it results in a "Can't execute the assertion" error.
To Reproduce I use below code
first step i download deeque jars in emr cluster : spark.jars.packages com.amazon.deequ:deequ:2.0.4-spark-3.3 spark.jars /home/hadoop/deequ-2.0.4-spark-3.3.jar
set up spark version: os.environ["SPARK_VERSION"] = "3.3"
then below code: from pydeequ.checks import from pydeequ.verification import
check = Check(spark, CheckLevel.Warning, "Review Check")
checkResult = VerificationSuite(spark) \ .onData(df) \ .addCheck( check .hasSize(lambda x: x >= 3000000)\ .hasCompleteness("Department", lambda x: x >= 0.7)\ .isComplete("SubID")\ .isNonNegative("ListID")) \ .run()
checkResult_df = VerificationResult.checkResultsAsDataFrame(spark, checkResult) checkResult_df.show(100,False)
Expected behavior
Screenshots
Desktop (please complete the following information): I am working on below versions on zeppelin notebook: spark:3.1.2,python:3.7,pydeequ:1.1.1 also tried spark:3.3,python:3.7,pydeequ:1.1.0 either cases its not working
the error sometime says "Can't execute the assertion: An exception was raised by the Python Proxy. Return Message: null! " and few times it says can't execute the assertion: java.lang.String cannot be cast to java.lang.Boolean