awslabs / python-deequ

Python API for Deequ
Apache License 2.0
676 stars 131 forks source link

Can't execute the assertion: An exception was raised by the Python Proxy. Return Message: Object ID unknown #94

Open ghrajesh opened 2 years ago

ghrajesh commented 2 years ago

Describe the bug

When using "check.hasSize" showing a Failure constraint message "Can't execute the assertion: An exception was raised by the Python Proxy. Return Message: Object ID unknown!". Should be a success.

deequ jar version: deequ-1.2.0-spark-2.4.jar Spark Version: 2.4 Python Version: 3

checkResult = VerificationSuite(spark) \
    .onData(df) \
    .addCheck(
        check.hasSize(lambda x: x >= 3000000) \
        .hasMin("star_rating", lambda x: x == 1.0) \
        .hasMax("star_rating", lambda x: x == 5.0)  \
        .isComplete("review_id")  \
        .isUnique("review_id")  \
        .isComplete("marketplace")  \
        .isContainedIn("marketplace", ["US", "UK", "DE", "JP", "FR"]) \
        .isNonNegative("year")) \
    .run()

To Reproduce Define and Run Tests for Data section of this below notebook, https://github.com/awslabs/python-deequ/blob/master/tutorials/test_data_quality_at_scale.ipynb

Expected behavior Constraint check should be a success as the dataset contains > 3000000 records.

Screenshots PyDeequ_Exception Attached.

Additional context Tried multiple versions of the deequ jar starting from deequ-1.0.3.jar TO deequ-1.2.0-spark-2.4.jar.

Justrd0350 commented 2 years ago

i got the same error, have you find a way to solve this ?