Closed dilkushpatel closed 1 year ago
Change
checkResult = VerificationSuite(spark)
.onData(df)
.addCheck(
isComplete("month_id")
)
.run()
to
checkResult = VerificationSuite(spark)
.onData(df)
.addCheck(
check.isComplete("month_id")
)
.run()
See full code example here: https://github.com/awslabs/python-deequ#constraint-verification
interesting! I was actually trying that
still error though
Error: Check.isComplete() missing 1 required positional argument: 'column'
Code: checkResult = VerificationSuite(spark) \ .onData(df) \ .addCheck(Check.isComplete("month_id")).run()
Ignore...
changed Check to check and that worked.
Thanks.
Thanks for confirming.
Since you have the following line,
check = Check(spark, CheckLevel.Error, "Data QC")
check.isComplete
is correct as opposed to Check.isComplete
Ask questions that don't apply to the other templates (Bug report, Feature request)
I'm trying to implement basic checks on columns of table which is in SQL Azure DW
till reading data works fine
I can also run ConstraintSuggestionRunner
When I run VerificationSuite with single check isComplete its giving error
Error: name 'isComplete' is not defined
Code: import sagemaker_pyspark import pydeequ from pyspark.sql import SparkSession from pydeequ.analyzers import from pydeequ.checks import from pydeequ.verification import from pydeequ.anomaly_detection import
classpath = ":".join(sagemaker_pyspark.classpath_jars())
spark = (SparkSession .builder .config("spark.driver.extraClassPath", classpath) .config("spark.jars.packages", pydeequ.deequ_maven_coord) .config("spark.jars.excludes", pydeequ.f2j_maven_coord) .getOrCreate())
check = Check(spark, CheckLevel.Error, "Data QC")
checkResult = VerificationSuite(spark) \ .onData(df) \ .addCheck(isComplete("month_id")).run()
checkResult_df = VerificationResult.checkResultsAsDataFrame(spark, checkResult) checkResult_df.show()
tried google did not get anything relevant.
Same error with any other check as well.