Open Awes35 opened 7 months ago
@Awes35 Have you tried the addConstraints
interface which support a list of constraints?
I have had the same problem as outlined above. I have found it occurs with hasMin(), hasMax(), and hasNumberOfDistinctValues(), but not isContainedIn() when I add multiple constraints of the same type on different columns. I also get different results depending on the order in which I add constraints (of the same type). There seems to be no problems with adding constraints where each a different type.
addConstraints() did not solve this for me. I have been able to workaround for now by adding each constraint as a separate check to the run.
I'm using pydeequ 1.2.0 and pyspark 3.3.4.
Python does not capture mval
into your lambda's closure. So at best mval
contains the last value in your dictionary, at worst it goes out of scope/gets gc-ed (maybe that's why you get None
comparisons).
I had the same issue and found this helpful : https://stackoverflow.com/a/2295372
Basically use [warning, untested]:
for c, mval in max_vals_dict.items():
check.addConstraint(check.hasMax(c.lower(), lambda x, mval = mval: x <= mval))
Describe the bug When repeatedly using addConstraint() to add to a Check object, the constraint values seem to be mixed during verification run. The outputs indicate other constraints fail, when in isolation the constraints succeed.
To Reproduce Steps to reproduce the behavior:
max_vals_dict = {"TELCO98_SCORE":999, "ADVANCEDENERGYRISK_SCORE":999, "BANKRUPTCYNAVIGATOR_SCORE":300, "EQUIFAXRISK_SCORE":999, "VANTAGE_SCORE":999, "WIRELESS2000_SCORE":999, "AUTOFINANCEPREDICTOR_SCORE":650}
check = Check(spark, CheckLevel.Warning, "Review Check")
data_df = spark.read.table("mydb.mytablename")
for c, mval in max_vals_dict.items(): check.addConstraint(check.hasMax(c.lower(), lambda x: x <= mval))
check.hasMax(c.lower(), lambda x: x <= int(mval))
checkResult = VerificationSuite(spark).onData(data_df).addCheck(check).run()
checkResult_df = VerificationResult.checkResultsAsDataFrame(spark, checkResult) checkResult_df.show(truncate=False)
+------------+-----------+------------+-----------------------------------------------------------+-----------------+------------------------------------------------------+ |check |check_level|check_status|constraint |constraint_status|constraint_message | +------------+-----------+------------+-----------------------------------------------------------+-----------------+------------------------------------------------------+ |Review Check|Warning |Warning |MaximumConstraint(Maximum(telco98_score,None)) |Failure |Value: 988.0 does not meet the constraint requirement!| |Review Check|Warning |Warning |MaximumConstraint(Maximum(advancedenergyrisk_score,None)) |Failure |Value: 979.0 does not meet the constraint requirement!| |Review Check|Warning |Warning |MaximumConstraint(Maximum(bankruptcynavigator_score,None)) |Success | | |Review Check|Warning |Warning |MaximumConstraint(Maximum(equifaxrisk_score,None)) |Failure |Value: 829.0 does not meet the constraint requirement!| |Review Check|Warning |Warning |MaximumConstraint(Maximum(vantage_score,None)) |Failure |Value: 844.0 does not meet the constraint requirement!| |Review Check|Warning |Warning |MaximumConstraint(Maximum(wireless2000_score,None)) |Failure |Value: 997.0 does not meet the constraint requirement!| |Review Check|Warning |Warning |MaximumConstraint(Maximum(autofinancepredictor_score,None))|Failure |Value: 702.0 does not meet the constraint requirement!| +------------+-----------+------------+-----------------------------------------------------------+-----------------+------------------------------------------------------+
max_vals_dict = {"TELCO98_SCORE":999}
check = Check(spark, CheckLevel.Warning, "Review Check")
for c, mval in max_vals_dict.items(): check.addConstraint(check.hasMax(c.lower(), lambda x: x <= mval))
checkResult = VerificationSuite(spark).onData(data_df).addCheck(check).run()
checkResult_df = VerificationResult.checkResultsAsDataFrame(spark, checkResult) checkResult_df.show(truncate=False)
+------------+-----------+------------+----------------------------------------------+-----------------+------------------+ |check |check_level|check_status|constraint |constraint_status|constraint_message| +------------+-----------+------------+----------------------------------------------+-----------------+------------------+ |Review Check|Warning |Success |MaximumConstraint(Maximum(telco98_score,None))|Success | | +------------+-----------+------------+----------------------------------------------+-----------------+------------------+