awslabs / python-deequ

Python API for Deequ
Apache License 2.0
691 stars 132 forks source link

Can't execute the assertion: An exception was raised by the Python Proxy. Return Message: Object ID unknown! #38

Open aroradhruv73 opened 3 years ago

aroradhruv73 commented 3 years ago

Hi I am using pydeequ to do some simple testing of data. Even the examples mentioned in the document also report Failure. My code is below and always give me Failure , for example on hasSize

checkResult = VerificationSuite(spark) .onData(df_perc_per_rec) .addCheck( check.hasSize(lambda x: x >= 32) .isComplete("report_mst_date") .satisfies('perc_of_count_total >=2 AND perc_of_count_total <= 3', 'b and c', 'None','None') ).run()

checkResult_df = VerificationResult.checkResultsAsDataFrame(spark, checkResult) checkResult_df.show() Python Callback server started! +----------------+-----------+------------+--------------------+-----------------+--------------------+ | check|check_level|check_status| constraint|constraint_status| constraint_message| +----------------+-----------+------------+--------------------+-----------------+--------------------+ |Integrity checks| Error| Error|SizeConstraint(Si...| Failure|Can't execute the...| |Integrity checks| Error| Error|CompletenessConst...| Success| | |Integrity checks| Error| Error|ComplianceConstra...| Failure|Can't execute the...| +----------------+-----------+------------+--------------------+-----------------+--------------------+

for check_json in checkResult.checkResults: if check_json['constraint_status'] != "Success": print(f"\t{check_json['constraint']} failed because: {check_json['constraint_message']}")

SizeConstraint(Size(None)) failed because: Can't execute the assertion: An exception was raised by the Python Proxy. Return Message: Object ID unknown! ComplianceConstraint(Compliance(b and c,perc_of_count_total >=2 AND perc_of_count_total <= 3,None)) failed because: Can't execute the assertion: An exception was raised by the Python Proxy. Return Message: null!

Can somebody help me here to resolve it ? I am using Spark 2.4, Python 3 (Glue Version 1.0)

Thanks in advance