Closed ianclari closed 3 years ago
To allow functional-like usage like before and the fact that Python is a fully functional programming language, PyDeequ will need you to construct a Check
object first, and then reference it in the list to subsequently add your constraints in a list manner. Take a look at this and let me know if that works!
check = Check(self.spark, CheckLevel.Warning, "test list constraints")
check.addConstraints([check.isComplete('c'),
check.isUnique('b')])
result = VerificationSuite(self.spark).onData(self.df) \
.addCheck(check) \
.run()
thank you @gucciwang
EDIT: disregard notes below as I understand now that it still has to be applied. the logic presented above does make sense to me!
_i am trying this code following your sample and encountering an error ("'Check' object has no attribute 'addConstraints'
"). is it a version issue?
i'm using the following
check = Check(spark, CheckLevel.Warning, "Review Check 2")
check.addConstraints([check.isComplete('gender')])
checkResult = VerificationSuite(spark) \
.onData(data_df) \
.addCheck(check) \
.run()
checkResult_df2 = VerificationResult.checkResultsAsDataFrame(spark, checkResult)
checkResult_df2.show()
```_
Is your feature request related to a problem? Please describe. When executing VerificationSuite, I need to list down all the constraints I need to check one by one. There should be a way to register the list of constraints so it's easier to run VerificationSuite.
Describe the solution you'd like I observed that running VerificationSuite means we need to add each constraint we want to evaluate by instantiating pydeequ.checks.Check (https://pydeequ.readthedocs.io/en/latest/pydeequ.html#module-pydeequ.checks).
(in code below I wanted to implement column checks using .IsComplete() , isUnique() and isNonNegative() on certain columns )
I saw that there is a placeholder parameter under pydeequ.checks.Check called constraints ( (
Check(spark_session=spark, level=CheckLevel.Warning, description="Review Check", constraints=[]))
) which can be the way to register the list of constraints and make the call to VerificationSuite more generic/simple.I believe enabling this functionality will increase adoption of pydeequ for Python developers dabbling in data quality use cases.
Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.
Additional context Add any other context or screenshots about the feature request here.