awslabs / python-deequ

Python API for Deequ
Apache License 2.0
691 stars 132 forks source link

Can't execute the assertion: java.lang.String cannot be cast to java.lang.Boolean! #22

Open MatthieuBlais opened 3 years ago

MatthieuBlais commented 3 years ago

Describe the issue: While running the tutorial code on AWS Glue, I can't get the checks to pass when there is an assertion. It seems to be a type issue.

"Can't execute the assertion: java.lang.String cannot be cast to java.lang.Boolean!"

Environment: Glue Development Endpoint Spark Version: 2.4 Python Version: 3 Pydeequ version: 0.1.5 Deequ jar: 1.0.3

Code: Using the tutorial sample code:

df = spark.sparkContext.parallelize([ Row(a="foo", b=1, c=5), Row(a="bar", b=2, c=6), Row(a="baz", b=3, c=None)]).toDF()

check = Check(spark, CheckLevel.Warning, "Review Check")

checkResult = VerificationSuite(spark) \ .onData(df) \ .addCheck( check.hasSize(lambda x: x >= 3) \ .hasMin("b", lambda x: x == 0) \ .isComplete("c") \ .isUnique("a") \ .isContainedIn("a", ["foo", "bar", "baz"]) \ .isNonNegative("b")) \ .run()

checkResult_df = VerificationResult.checkResultsAsDataFrame(spark, checkResult) checkResult_df.show()

Output: +------------+-----------+------------+--------------------+-----------------+--------------------+ | check|check_level|check_status| constraint|constraint_status| constraint_message| +------------+-----------+------------+--------------------+-----------------+--------------------+ |Review Check| Warning| Warning|SizeConstraint(Si...| Failure|Can't execute the...| |Review Check| Warning| Warning|MinimumConstraint...| Failure|Can't execute the...| |Review Check| Warning| Warning|CompletenessConst...| Failure|Value: 0.66666666...| |Review Check| Warning| Warning|UniquenessConstra...| Success| | |Review Check| Warning| Warning|ComplianceConstra...| Success| | |Review Check| Warning| Warning|ComplianceConstra...| Success| | +------------+-----------+------------+--------------------+-----------------+--------------------+

Expanding the two first rows with failure:

Row(check='Review Check', check_level='Warning', check_status='Warning', constraint='SizeConstraint(Size(None))', constraint_status='Failure', constraint_message="Can't execute the assertion: java.lang.String cannot be cast to java.lang.Boolean!") Row(check='Review Check', check_level='Warning', check_status='Warning', constraint='MinimumConstraint(Minimum(b,None))', constraint_status='Failure', constraint_message="Can't execute the assertion: java.lang.String cannot be cast to java.lang.Boolean!")

cghyzel commented 3 years ago

Which Glue version are you on?

saulo-s commented 3 years ago

Hi!

I am having exactly the same issue, on satisfies and every other has* function. Not running on Glue however . I am using this test dockerized zeppelin.

The error:

image

Setup details in the images below

image image

@MatthieuBlais I am sorry for taking over this answer. I believe our problem is very likely the same and the solution should be the same as well :)