When trying to run deequ rules on an empty dataframe, seeing error.
Sometimes, we have empty dataframes which we pass through Deequ to explicitly check if the "Size" check is passing for 0 row count as expected.
Below is an example of running Deequ check on empty dataframe for zero row size and applied filter too, which fails but i expect it should pass.
val dataFrame=getNumberDataFrame(13).filter(col("Number")===100) //this returns empty dataframe
val result1 = VerificationSuite()
.onData(dataFrame)
.addCheck(Check(CheckLevel.Error,"")
.hasSize(_ ==0)
.where("Number=10"))
.run()
Post executing above logic, i am expecting the result should say "success", but i see error below. Is this as per design?
I feel this should work, as this is generally one of the normal scenarios.
But when i run the same code above without .where("Number=10")) it says success.
Error :
VerificationResult(Error,Map(Check(Error,,List(UniquenessConstraint(Uniqueness(List(Number),None)))) -> CheckResult(Check(Error,,List(UniquenessConstraint(Uniqueness(List(Number),None)))),Error,List(ConstraintResult(UniquenessConstraint(Uniqueness(List(Number),None)),Failure,Some(Empty state for analyzer Uniqueness(List(Number),None), all input values were NULL.),Some(DoubleMetric(Column,Uniqueness,Number,Failure(com.amazon.deequ.analyzers.runners.EmptyStateException: Empty state for analyzer Uniqueness(List(Number),None), all input values were NULL.))))))),Map(Uniqueness(List(Number),None) -> DoubleMetric(Column,Uniqueness,Number,Failure(com.amazon.deequ.analyzers.runners.EmptyStateException: Empty state for analyzer Uniqueness(List(Number),None), all input values were NULL.))))
When trying to run deequ rules on an empty dataframe, seeing error.
Sometimes, we have empty dataframes which we pass through Deequ to explicitly check if the "Size" check is passing for 0 row count as expected.
Below is an example of running Deequ check on empty dataframe for zero row size and applied filter too, which fails but i expect it should pass.
val dataFrame=getNumberDataFrame(13).filter(col("Number")===100) //this returns empty dataframe
val result1 = VerificationSuite() .onData(dataFrame) .addCheck(Check(CheckLevel.Error,"") .hasSize(_ ==0) .where("Number=10")) .run()
Post executing above logic, i am expecting the result should say "success", but i see error below. Is this as per design? I feel this should work, as this is generally one of the normal scenarios.
But when i run the same code above without .where("Number=10")) it says success.
Error : VerificationResult(Error,Map(Check(Error,,List(UniquenessConstraint(Uniqueness(List(Number),None)))) -> CheckResult(Check(Error,,List(UniquenessConstraint(Uniqueness(List(Number),None)))),Error,List(ConstraintResult(UniquenessConstraint(Uniqueness(List(Number),None)),Failure,Some(Empty state for analyzer Uniqueness(List(Number),None), all input values were NULL.),Some(DoubleMetric(Column,Uniqueness,Number,Failure(com.amazon.deequ.analyzers.runners.EmptyStateException: Empty state for analyzer Uniqueness(List(Number),None), all input values were NULL.))))))),Map(Uniqueness(List(Number),None) -> DoubleMetric(Column,Uniqueness,Number,Failure(com.amazon.deequ.analyzers.runners.EmptyStateException: Empty state for analyzer Uniqueness(List(Number),None), all input values were NULL.))))