awslabs / python-deequ

Python API for Deequ
Apache License 2.0
713 stars 134 forks source link

not able to execute lambda functions in the checks #167

Closed Ashokgoa closed 5 months ago

Ashokgoa commented 11 months ago

Describe the bug When passing a lambda function for an assertion for hasSize, hasMin, or hasMax, it results in a "Can't execute the assertion" error.

To Reproduce I use below code

first step i download deeque jars in emr cluster : spark.jars.packages com.amazon.deequ:deequ:2.0.4-spark-3.3 spark.jars /home/hadoop/deequ-2.0.4-spark-3.3.jar

set up spark version: os.environ["SPARK_VERSION"] = "3.3"

then below code: from pydeequ.checks import from pydeequ.verification import

check = Check(spark, CheckLevel.Warning, "Review Check")

checkResult = VerificationSuite(spark) \ .onData(df) \ .addCheck( check .hasSize(lambda x: x >= 3000000)\ .hasCompleteness("Department", lambda x: x >= 0.7)\ .isComplete("SubID")\ .isNonNegative("ListID")) \ .run()

checkResult_df = VerificationResult.checkResultsAsDataFrame(spark, checkResult) checkResult_df.show(100,False)

Expected behavior image

Screenshots

image

Desktop (please complete the following information): I am working on below versions on zeppelin notebook: spark:3.1.2,python:3.7,pydeequ:1.1.1 also tried spark:3.3,python:3.7,pydeequ:1.1.0 either cases its not working

the error sometime says "Can't execute the assertion: An exception was raised by the Python Proxy. Return Message: null! " and few times it says can't execute the assertion: java.lang.String cannot be cast to java.lang.Boolean

chenliu0831 commented 11 months ago

Related to https://github.com/awslabs/python-deequ/issues/55. Pending research

chenliu0831 commented 11 months ago

See #169 for workaround

chenliu0831 commented 5 months ago

This will be resolved in next release which will include https://github.com/awslabs/python-deequ/issues/169.