awslabs / python-deequ

Python API for Deequ
Apache License 2.0
669 stars 131 forks source link

default suggestion check error in iscontained #161

Closed poudelankit closed 9 months ago

poudelankit commented 9 months ago

code for ConstraintSuggestion Default suggestions = ConstraintSuggestionRunner(spark).onData(df).addConstraintRule(DEFAULT()).run() for suggestion in suggestions['constraint_suggestions']: print(suggestion['code_for_constraint'])

Result of default suggestion The default constraint suggestion will return the following as a value under 'code_for_constraint': .isContainedIn("Category #1", ["Dental Surgery", "Laboratory"], lambda x: x >= 0.98, "It should be above 0.98!") which when applied to addCheck would result in error.

Check Implementation: check = Check(spark,CheckLevel.Error,"Manual Check") verification_runner = VerificationSuite(spark).onData(df).addCheck(check.isContainedIn("Category #1", ["Dental Surgery", "Laboratory"], lambda x: x >= 0.98, "It should be above 0.98!")) verification_result = verification_runner.run() df_checked = VerificationResult.checkResultsAsDataFrame(spark,verification_result) df_checked.show(truncate=False)

error TypeError: isContainedIn() takes 3 positional arguments but 5 were given

image

chenliu0831 commented 9 months ago

This should fix the error https://github.com/awslabs/python-deequ/pull/157?

poudelankit commented 9 months ago

Implementation of scala in python worked for me::

Change was made on checks.py

def isContainedIn(self, column,allowed_values, assertion=None, hint=None): """