awslabs / deequ

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Apache License 2.0
3.3k stars 538 forks source link

[BUG] Deequ 2.0.7 - Spark CodeGenerator ERROR - Expression is not an rvalue #592

Open pawelpinkos opened 1 week ago

pawelpinkos commented 1 week ago

Describe the bug Beginning with version 2.0.7 of Deequ (all spark releases) there is a bug in library witch happend failing of catalyst codegen in spark. The exception is catched so this do not fail runtime, you can observe the issue in the logs (eg. try to run MaximumTest from Deequ tests and see the log).

I have investigated and in my opinion the root cause of issue is the change: https://github.com/awslabs/deequ/commit/34d8f3ae70df5a049129f423e2d296ea81a6a1b8

Error is throw when AnalisisRunner call dataframe.agg() here depending of provided parameters. Eg. before deequ 2.0.7 (for the example provided in "To Reproduce" section) the parameteres were:

And there was no error. For deequ 2.0.7 the parameters are:

And the error is thrown.

This is cause of a lot of errors in logs of application witch use Deequ. I have tried to bump deequ in my project to 2.0.7 but beacuse of this I have to postpone this action.

24/10/22 10:13:20 ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 91, Column 1: Expression "hashAgg_isNull_21" is not an rvalue
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 91, Column 1: Expression "hashAgg_isNull_21" is not an rvalue
    at org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12021)
    at org.codehaus.janino.UnitCompiler.toRvalueOrCompileException(UnitCompiler.java:7575)
    at org.codehaus.janino.UnitCompiler.compileContext2(UnitCompiler.java:4377)
    at org.codehaus.janino.UnitCompiler.access$6700(UnitCompiler.java:226)
    at org.codehaus.janino.UnitCompiler$15$1.visitAmbiguousName(UnitCompiler.java:4326)
    at org.codehaus.janino.UnitCompiler$15$1.visitAmbiguousName(UnitCompiler.java:4323)
    at org.codehaus.janino.Java$AmbiguousName.accept(Java.java:4429)
    at org.codehaus.janino.UnitCompiler$15.visitLvalue(UnitCompiler.java:4323)
    at org.codehaus.janino.UnitCompiler$15.visitLvalue(UnitCompiler.java:4319)
    at org.codehaus.janino.Java$Lvalue.accept(Java.java:4353)
    at org.codehaus.janino.UnitCompiler.compileContext(UnitCompiler.java:4319)
    at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:3838)
    at org.codehaus.janino.UnitCompiler.access$6100(UnitCompiler.java:226)
    at org.codehaus.janino.UnitCompiler$13.visitAssignment(UnitCompiler.java:3799)
    at org.codehaus.janino.UnitCompiler$13.visitAssignment(UnitCompiler.java:3779)
    at org.codehaus.janino.Java$Assignment.accept(Java.java:4690)
    at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3779)
    at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2366)
    at org.codehaus.janino.UnitCompiler.access$1800(UnitCompiler.java:226)
    at org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1497)
    at org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1490)
    at org.codehaus.janino.Java$ExpressionStatement.accept(Java.java:3064)
    at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:1490)
    at org.codehaus.janino.UnitCompiler.compileStatements(UnitCompiler.java:1573)
    at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:1559)
    at org.codehaus.janino.UnitCompiler.access$1700(UnitCompiler.java:226)
    at org.codehaus.janino.UnitCompiler$6.visitBlock(UnitCompiler.java:1496)
    at org.codehaus.janino.UnitCompiler$6.visitBlock(UnitCompiler.java:1490)
    at org.codehaus.janino.Java$Block.accept(Java.java:2969)
    at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:1490)
    at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2475)
    at org.codehaus.janino.UnitCompiler.access$1900(UnitCompiler.java:226)
    at org.codehaus.janino.UnitCompiler$6.visitIfStatement(UnitCompiler.java:1498)
    at org.codehaus.janino.UnitCompiler$6.visitIfStatement(UnitCompiler.java:1490)
    at org.codehaus.janino.Java$IfStatement.accept(Java.java:3140)
    at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:1490)
    at org.codehaus.janino.UnitCompiler.compileStatements(UnitCompiler.java:1573)
    at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3420)
    at org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1362)
    at org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1335)
    at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:807)
    at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:975)
    at org.codehaus.janino.UnitCompiler.access$700(UnitCompiler.java:226)
    at org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:392)
    at org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:384)
    at org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1445)
    at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:384)
    at org.codehaus.janino.UnitCompiler.compileDeclaredMemberTypes(UnitCompiler.java:1312)
    at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:833)
    at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:410)
    at org.codehaus.janino.UnitCompiler.access$400(UnitCompiler.java:226)
    at org.codehaus.janino.UnitCompiler$2.visitPackageMemberClassDeclaration(UnitCompiler.java:389)
    at org.codehaus.janino.UnitCompiler$2.visitPackageMemberClassDeclaration(UnitCompiler.java:384)
    at org.codehaus.janino.Java$PackageMemberClassDeclaration.accept(Java.java:1594)
    at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:384)
    at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:362)
    at org.codehaus.janino.UnitCompiler.access$000(UnitCompiler.java:226)
    at org.codehaus.janino.UnitCompiler$1.visitCompilationUnit(UnitCompiler.java:336)
    at org.codehaus.janino.UnitCompiler$1.visitCompilationUnit(UnitCompiler.java:333)
    at org.codehaus.janino.Java$CompilationUnit.accept(Java.java:363)
    at org.codehaus.janino.UnitCompiler.compileUnit(UnitCompiler.java:333)
    at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:235)
    at org.codehaus.janino.SimpleCompiler.compileToClassLoader(SimpleCompiler.java:464)
    at org.codehaus.janino.ClassBodyEvaluator.compileToClass(ClassBodyEvaluator.java:314)
    at org.codehaus.janino.ClassBodyEvaluator.cook(ClassBodyEvaluator.java:237)
    at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:205)
    at org.codehaus.commons.compiler.Cookable.cook(Cookable.java:80)
    at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1490)
    at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1587)
    at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1584)
    at org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
    at org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
    at org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
    at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
    at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000)
    at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
    at org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
    at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1437)
    at org.apache.spark.sql.execution.WholeStageCodegenExec.liftedTree1$1(WholeStageCodegenExec.scala:726)
    at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:725)
    at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:194)
    at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:232)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:229)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:190)
    at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.inputRDD$lzycompute(ShuffleExchangeExec.scala:135)
    at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.inputRDD(ShuffleExchangeExec.scala:135)
    at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.mapOutputStatisticsFuture$lzycompute(ShuffleExchangeExec.scala:140)
    at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.mapOutputStatisticsFuture(ShuffleExchangeExec.scala:139)
    at org.apache.spark.sql.execution.exchange.ShuffleExchangeLike.$anonfun$submitShuffleJob$1(ShuffleExchangeExec.scala:68)
    at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:232)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:229)
    at org.apache.spark.sql.execution.exchange.ShuffleExchangeLike.submitShuffleJob(ShuffleExchangeExec.scala:68)
    at org.apache.spark.sql.execution.exchange.ShuffleExchangeLike.submitShuffleJob$(ShuffleExchangeExec.scala:67)
    at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.submitShuffleJob(ShuffleExchangeExec.scala:115)
    at org.apache.spark.sql.execution.adaptive.ShuffleQueryStageExec.shuffleFuture$lzycompute(QueryStageExec.scala:174)
    at org.apache.spark.sql.execution.adaptive.ShuffleQueryStageExec.shuffleFuture(QueryStageExec.scala:174)
    at org.apache.spark.sql.execution.adaptive.ShuffleQueryStageExec.doMaterialize(QueryStageExec.scala:176)
    at org.apache.spark.sql.execution.adaptive.QueryStageExec.materialize(QueryStageExec.scala:82)
    at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$getFinalPhysicalPlan$5(AdaptiveSparkPlanExec.scala:258)
    at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$getFinalPhysicalPlan$5$adapted(AdaptiveSparkPlanExec.scala:256)
    at scala.collection.Iterator.foreach(Iterator.scala:943)
    at scala.collection.Iterator.foreach$(Iterator.scala:943)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
    at scala.collection.IterableLike.foreach(IterableLike.scala:74)
    at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
    at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$getFinalPhysicalPlan$1(AdaptiveSparkPlanExec.scala:256)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
    at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.getFinalPhysicalPlan(AdaptiveSparkPlanExec.scala:228)
    at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.withFinalPlanUpdate(AdaptiveSparkPlanExec.scala:367)
    at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.executeCollect(AdaptiveSparkPlanExec.scala:340)
    at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3868)
    at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:3120)
    at org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:3858)
    at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510)
    at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3856)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:109)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
    at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3856)
    at org.apache.spark.sql.Dataset.collect(Dataset.scala:3120)
    at com.amazon.deequ.analyzers.runners.AnalysisRunner$.liftedTree1$1(AnalysisRunner.scala:327)
    at com.amazon.deequ.analyzers.runners.AnalysisRunner$.runScanningAnalyzers(AnalysisRunner.scala:320)
    at com.amazon.deequ.analyzers.runners.AnalysisRunner$.doAnalysisRun(AnalysisRunner.scala:169)
    at com.amazon.deequ.analyzers.runners.AnalysisRunBuilder.run(AnalysisRunBuilder.scala:110)
    at com.amazon.deequ.profiles.ColumnProfiler$.profile(ColumnProfiler.scala:195)
    at com.amazon.deequ.profiles.ColumnProfilerRunner.run(ColumnProfilerRunner.scala:72)
    at com.amazon.deequ.profiles.ColumnProfilerRunBuilder.run(ColumnProfilerRunBuilder.scala:185)
    at DeequTest$.main(DeequTest.scala:25)
    at DeequTest.main(DeequTest.scala)
24/10/22 10:13:20 INFO CodeGenerator: 
/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIteratorForCodegenStage1(references);
/* 003 */ }
/* 004 */
/* 005 */ // codegenStageId=1
/* 006 */ final class GeneratedIteratorForCodegenStage1 extends org.apache.spark.sql.execution.BufferedRowIterator {
/* 007 */   private Object[] references;
/* 008 */   private scala.collection.Iterator[] inputs;
/* 009 */   private boolean hashAgg_initAgg_0;
/* 010 */   private boolean hashAgg_bufIsNull_0;
/* 011 */   private double hashAgg_bufValue_0;
/* 012 */   private boolean hashAgg_bufIsNull_1;
/* 013 */   private double hashAgg_bufValue_1;
/* 014 */   private boolean hashAgg_bufIsNull_2;
/* 015 */   private long hashAgg_bufValue_2;
/* 016 */   private boolean hashAgg_bufIsNull_3;
/* 017 */   private long hashAgg_bufValue_3;
/* 018 */   private boolean hashAgg_bufIsNull_4;
/* 019 */   private double hashAgg_bufValue_4;
/* 020 */   private boolean hashAgg_bufIsNull_5;
/* 021 */   private double hashAgg_bufValue_5;
/* 022 */   private boolean hashAgg_bufIsNull_6;
/* 023 */   private double hashAgg_bufValue_6;
/* 024 */   private scala.collection.Iterator localtablescan_input_0;
/* 025 */   private boolean hashAgg_subExprValue_0;
/* 026 */   private double hashAgg_subExprValue_1;
/* 027 */   private boolean hashAgg_subExprIsNull_0;
/* 028 */   private boolean hashAgg_hashAgg_isNull_27_0;
/* 029 */   private boolean hashAgg_hashAgg_isNull_29_0;
/* 030 */   private boolean hashAgg_hashAgg_isNull_32_0;
/* 031 */   private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter[] hashAgg_mutableStateArray_0 = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter[1];
/* 032 */
/* 033 */   public GeneratedIteratorForCodegenStage1(Object[] references) {
/* 034 */     this.references = references;
/* 035 */   }
/* 036 */
/* 037 */   public void init(int index, scala.collection.Iterator[] inputs) {
/* 038 */     partitionIndex = index;
/* 039 */     this.inputs = inputs;
/* 040 */
/* 041 */     localtablescan_input_0 = inputs[0];
/* 042 */     hashAgg_mutableStateArray_0[0] = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(7, 0);
/* 043 */
/* 044 */   }
/* 045 */
/* 046 */   private void hashAgg_doAggregateWithoutKey_0() throws java.io.IOException {
/* 047 */     // initialize aggregation buffer
/* 048 */     hashAgg_bufIsNull_0 = true;
/* 049 */     hashAgg_bufValue_0 = -1.0;
/* 050 */     hashAgg_bufIsNull_1 = true;
/* 051 */     hashAgg_bufValue_1 = -1.0;
/* 052 */     hashAgg_bufIsNull_2 = true;
/* 053 */     hashAgg_bufValue_2 = -1L;
/* 054 */     hashAgg_bufIsNull_3 = false;
/* 055 */     hashAgg_bufValue_3 = 0L;
/* 056 */     hashAgg_bufIsNull_4 = false;
/* 057 */     hashAgg_bufValue_4 = 0.0D;
/* 058 */     hashAgg_bufIsNull_5 = false;
/* 059 */     hashAgg_bufValue_5 = 0.0D;
/* 060 */     hashAgg_bufIsNull_6 = false;
/* 061 */     hashAgg_bufValue_6 = 0.0D;
/* 062 */
/* 063 */     while ( localtablescan_input_0.hasNext()) {
/* 064 */       InternalRow localtablescan_row_0 = (InternalRow) localtablescan_input_0.next();
/* 065 */       ((org.apache.spark.sql.execution.metric.SQLMetric) references[0] /* numOutputRows */).add(1);
/* 066 */       long localtablescan_value_0 = localtablescan_row_0.getLong(0);
/* 067 */
/* 068 */       hashAgg_doConsume_0(localtablescan_row_0, localtablescan_value_0);
/* 069 */       // shouldStop check is eliminated
/* 070 */     }
/* 071 */
/* 072 */   }
/* 073 */
/* 074 */   private void hashAgg_subExpr_1(long hashAgg_expr_0_0) {
/* 075 */     ArrayData hashAgg_arrayData_1 = ArrayData.allocateArrayData(
/* 076 */       -1, 2L, " createArray failed.");
/* 077 */
/* 078 */     hashAgg_arrayData_1.update(0, ((UTF8String) references[2] /* literal */));
/* 079 */
/* 080 */     boolean hashAgg_isNull_24 = false;
/* 081 */     UTF8String hashAgg_value_24 = null;
/* 082 */     if (!false) {
/* 083 */       hashAgg_value_24 = UTF8String.fromString(String.valueOf(hashAgg_expr_0_0));
/* 084 */     }
/* 085 */     hashAgg_arrayData_1.update(1, hashAgg_value_24);
/* 086 */
/* 087 */     UTF8String hashAgg_value_21 = null;
/* 088 */
/* 089 */     int hashAgg_elementAtIndex_1 = (int) 2;
/* 090 */     if (hashAgg_arrayData_1.numElements() < Math.abs(hashAgg_elementAtIndex_1)) {
/* 091 */       hashAgg_isNull_21 = true;
/* 092 */     } else {
/* 093 */       if (hashAgg_elementAtIndex_1 == 0) {
/* 094 */         throw QueryExecutionErrors.sqlArrayIndexNotStartAtOneError();
/* 095 */       } else if (hashAgg_elementAtIndex_1 > 0) {
/* 096 */         hashAgg_elementAtIndex_1--;
/* 097 */       } else {
/* 098 */         hashAgg_elementAtIndex_1 += hashAgg_arrayData_1.numElements();
/* 099 */       }
/* 100 */
/* 101 */       {
/* 102 */         hashAgg_value_21 = hashAgg_arrayData_1.getUTF8String(hashAgg_elementAtIndex_1);
/* 103 */       }
/* 104 */     }
/* 105 */     boolean hashAgg_isNull_20 = false;
/* 106 */     double hashAgg_value_20 = -1.0;
/* 107 */     if (!false) {
/* 108 */       final String hashAgg_doubleStr_1 = hashAgg_value_21.toString();
/* 109 */       try {
/* 110 */         hashAgg_value_20 = Double.valueOf(hashAgg_doubleStr_1);
/* 111 */       } catch (java.lang.NumberFormatException e) {
/* 112 */         final Double d = (Double) Cast.processFloatingPointSpecialLiterals(hashAgg_doubleStr_1, false);
/* 113 */         if (d == null) {
/* 114 */           hashAgg_isNull_20 = true;
/* 115 */         } else {
/* 116 */           hashAgg_value_20 = d.doubleValue();
/* 117 */         }
/* 118 */       }
/* 119 */     }
/* 120 */     hashAgg_subExprIsNull_0 = hashAgg_isNull_20;
/* 121 */     hashAgg_subExprValue_1 = hashAgg_value_20;
/* 122 */   }
/* 123 */
/* 124 */   private void hashAgg_doAggregate_max_0() throws java.io.IOException {
/* 125 */     hashAgg_hashAgg_isNull_29_0 = true;
/* 126 */     double hashAgg_value_29 = -1.0;
/* 127 */
/* 128 */     if (!hashAgg_bufIsNull_1 && (hashAgg_hashAgg_isNull_29_0 ||
/* 129 */         (org.apache.spark.sql.catalyst.util.SQLOrderingUtil.compareDoubles(hashAgg_bufValue_1, hashAgg_value_29)) > 0)) {
/* 130 */       hashAgg_hashAgg_isNull_29_0 = false;
/* 131 */       hashAgg_value_29 = hashAgg_bufValue_1;
/* 132 */     }
/* 133 */
/* 134 */     if (!hashAgg_subExprIsNull_0 && (hashAgg_hashAgg_isNull_29_0 ||
/* 135 */         (org.apache.spark.sql.catalyst.util.SQLOrderingUtil.compareDoubles(hashAgg_subExprValue_1, hashAgg_value_29)) > 0)) {
/* 136 */       hashAgg_hashAgg_isNull_29_0 = false;
/* 137 */       hashAgg_value_29 = hashAgg_subExprValue_1;
/* 138 */     }
/* 139 */
/* 140 */     hashAgg_bufIsNull_1 = hashAgg_hashAgg_isNull_29_0;
/* 141 */     hashAgg_bufValue_1 = hashAgg_value_29;
/* 142 */   }
/* 143 */
/* 144 */   private void hashAgg_doConsume_0(InternalRow localtablescan_row_0, long hashAgg_expr_0_0) throws java.io.IOException {
/* 145 */     // do aggregate
/* 146 */     // common sub-expressions
/* 147 */
/* 148 */     hashAgg_subExpr_1(hashAgg_expr_0_0);
/* 149 */
/* 150 */     hashAgg_subExpr_0(hashAgg_expr_0_0);
/* 151 */
/* 152 */     // evaluate aggregate functions and update aggregation buffers
/* 153 */     hashAgg_doAggregate_min_0();
/* 154 */     hashAgg_doAggregate_max_0();
/* 155 */     hashAgg_doAggregate_sum_0(hashAgg_expr_0_0);
/* 156 */     hashAgg_doAggregate_count_0();
/* 157 */     hashAgg_doAggregate_stateful_stddev_pop_0(hashAgg_expr_0_0);
/* 158 */
/* 159 */   }
/* 160 */
/* 161 */   private void hashAgg_doAggregate_stateful_stddev_pop_0(long hashAgg_expr_0_0) throws java.io.IOException {
/* 162 */     boolean hashAgg_isNull_39 = false;
/* 163 */     double hashAgg_value_39 = -1.0;
/* 164 */     if (!false && hashAgg_subExprValue_0) {
/* 165 */       hashAgg_isNull_39 = hashAgg_bufIsNull_4;
/* 166 */       hashAgg_value_39 = hashAgg_bufValue_4;
/* 167 */     } else {
/* 168 */       double hashAgg_value_41 = -1.0;
/* 169 */
/* 170 */       hashAgg_value_41 = hashAgg_bufValue_4 + 1.0D;
/* 171 */       hashAgg_isNull_39 = false;
/* 172 */       hashAgg_value_39 = hashAgg_value_41;
/* 173 */     }
/* 174 */     boolean hashAgg_isNull_44 = false;
/* 175 */     double hashAgg_value_44 = -1.0;
/* 176 */     if (!false && hashAgg_subExprValue_0) {
/* 177 */       hashAgg_isNull_44 = hashAgg_bufIsNull_5;
/* 178 */       hashAgg_value_44 = hashAgg_bufValue_5;
/* 179 */     } else {
/* 180 */       boolean hashAgg_isNull_46 = true;
/* 181 */       double hashAgg_value_46 = -1.0;
/* 182 */
/* 183 */       double hashAgg_value_53 = -1.0;
/* 184 */
/* 185 */       hashAgg_value_53 = hashAgg_bufValue_4 + 1.0D;
/* 186 */       boolean hashAgg_isNull_48 = false;
/* 187 */       double hashAgg_value_48 = -1.0;
/* 188 */       if (hashAgg_value_53 == 0) {
/* 189 */         hashAgg_isNull_48 = true;
/* 190 */       } else {
/* 191 */         boolean hashAgg_isNull_50 = false;
/* 192 */         double hashAgg_value_50 = -1.0;
/* 193 */         if (!false) {
/* 194 */           hashAgg_value_50 = (double) hashAgg_expr_0_0;
/* 195 */         }
/* 196 */
/* 197 */         double hashAgg_value_49 = -1.0;
/* 198 */
/* 199 */         hashAgg_value_49 = hashAgg_value_50 - hashAgg_bufValue_5;
/* 200 */
/* 201 */         hashAgg_value_48 = (double)(hashAgg_value_49 / hashAgg_value_53);
/* 202 */       }
/* 203 */       if (!hashAgg_isNull_48) {
/* 204 */         hashAgg_isNull_46 = false; // resultCode could change nullability.
/* 205 */
/* 206 */         hashAgg_value_46 = hashAgg_bufValue_5 + hashAgg_value_48;
/* 207 */
/* 208 */       }
/* 209 */       hashAgg_isNull_44 = hashAgg_isNull_46;
/* 210 */       hashAgg_value_44 = hashAgg_value_46;
/* 211 */     }
/* 212 */     boolean hashAgg_isNull_56 = false;
/* 213 */     double hashAgg_value_56 = -1.0;
/* 214 */     if (!false && hashAgg_subExprValue_0) {
/* 215 */       hashAgg_isNull_56 = hashAgg_bufIsNull_6;
/* 216 */       hashAgg_value_56 = hashAgg_bufValue_6;
/* 217 */     } else {
/* 218 */       boolean hashAgg_isNull_58 = true;
/* 219 */       double hashAgg_value_58 = -1.0;
/* 220 */
/* 221 */       boolean hashAgg_isNull_60 = true;
/* 222 */       double hashAgg_value_60 = -1.0;
/* 223 */       boolean hashAgg_isNull_62 = false;
/* 224 */       double hashAgg_value_62 = -1.0;
/* 225 */       if (!false) {
/* 226 */         hashAgg_value_62 = (double) hashAgg_expr_0_0;
/* 227 */       }
/* 228 */
/* 229 */       double hashAgg_value_61 = -1.0;
/* 230 */
/* 231 */       hashAgg_value_61 = hashAgg_value_62 - hashAgg_bufValue_5;
/* 232 */       boolean hashAgg_isNull_65 = true;
/* 233 */       double hashAgg_value_65 = -1.0;
/* 234 */       boolean hashAgg_isNull_67 = false;
/* 235 */       double hashAgg_value_67 = -1.0;
/* 236 */       if (!false) {
/* 237 */         hashAgg_value_67 = (double) hashAgg_expr_0_0;
/* 238 */       }
/* 239 */
/* 240 */       double hashAgg_value_66 = -1.0;
/* 241 */
/* 242 */       hashAgg_value_66 = hashAgg_value_67 - hashAgg_bufValue_5;
/* 243 */       double hashAgg_value_75 = -1.0;
/* 244 */
/* 245 */       hashAgg_value_75 = hashAgg_bufValue_4 + 1.0D;
/* 246 */       boolean hashAgg_isNull_70 = false;
/* 247 */       double hashAgg_value_70 = -1.0;
/* 248 */       if (hashAgg_value_75 == 0) {
/* 249 */         hashAgg_isNull_70 = true;
/* 250 */       } else {
/* 251 */         boolean hashAgg_isNull_72 = false;
/* 252 */         double hashAgg_value_72 = -1.0;
/* 253 */         if (!false) {
/* 254 */           hashAgg_value_72 = (double) hashAgg_expr_0_0;
/* 255 */         }
/* 256 */
/* 257 */         double hashAgg_value_71 = -1.0;
/* 258 */
/* 259 */         hashAgg_value_71 = hashAgg_value_72 - hashAgg_bufValue_5;
/* 260 */
/* 261 */         hashAgg_value_70 = (double)(hashAgg_value_71 / hashAgg_value_75);
/* 262 */       }
/* 263 */       if (!hashAgg_isNull_70) {
/* 264 */         hashAgg_isNull_65 = false; // resultCode could change nullability.
/* 265 */
/* 266 */         hashAgg_value_65 = hashAgg_value_66 - hashAgg_value_70;
/* 267 */
/* 268 */       }
/* 269 */       if (!hashAgg_isNull_65) {
/* 270 */         hashAgg_isNull_60 = false; // resultCode could change nullability.
/* 271 */
/* 272 */         hashAgg_value_60 = hashAgg_value_61 * hashAgg_value_65;
/* 273 */
/* 274 */       }
/* 275 */       if (!hashAgg_isNull_60) {
/* 276 */         hashAgg_isNull_58 = false; // resultCode could change nullability.
/* 277 */
/* 278 */         hashAgg_value_58 = hashAgg_bufValue_6 + hashAgg_value_60;
/* 279 */
/* 280 */       }
/* 281 */       hashAgg_isNull_56 = hashAgg_isNull_58;
/* 282 */       hashAgg_value_56 = hashAgg_value_58;
/* 283 */     }
/* 284 */
/* 285 */     hashAgg_bufIsNull_4 = hashAgg_isNull_39;
/* 286 */     hashAgg_bufValue_4 = hashAgg_value_39;
/* 287 */
/* 288 */     hashAgg_bufIsNull_5 = hashAgg_isNull_44;
/* 289 */     hashAgg_bufValue_5 = hashAgg_value_44;
/* 290 */
/* 291 */     hashAgg_bufIsNull_6 = hashAgg_isNull_56;
/* 292 */     hashAgg_bufValue_6 = hashAgg_value_56;
/* 293 */   }
/* 294 */
/* 295 */   private void hashAgg_subExpr_0(long hashAgg_expr_0_0) {
/* 296 */     boolean hashAgg_isNull_18 = false;
/* 297 */     double hashAgg_value_18 = -1.0;
/* 298 */     if (!false) {
/* 299 */       hashAgg_value_18 = (double) hashAgg_expr_0_0;
/* 300 */     }
/* 301 */
/* 302 */     hashAgg_subExprValue_0 = hashAgg_isNull_18;
/* 303 */   }
/* 304 */
/* 305 */   private void hashAgg_doAggregate_sum_0(long hashAgg_expr_0_0) throws java.io.IOException {
/* 306 */     hashAgg_hashAgg_isNull_32_0 = true;
/* 307 */     long hashAgg_value_32 = -1L;
/* 308 */     do {
/* 309 */       if (!hashAgg_bufIsNull_2) {
/* 310 */         hashAgg_hashAgg_isNull_32_0 = false;
/* 311 */         hashAgg_value_32 = hashAgg_bufValue_2;
/* 312 */         continue;
/* 313 */       }
/* 314 */
/* 315 */       if (!false) {
/* 316 */         hashAgg_hashAgg_isNull_32_0 = false;
/* 317 */         hashAgg_value_32 = 0L;
/* 318 */         continue;
/* 319 */       }
/* 320 */
/* 321 */     } while (false);
/* 322 */
/* 323 */     long hashAgg_value_31 = -1L;
/* 324 */
/* 325 */     hashAgg_value_31 = hashAgg_value_32 + hashAgg_expr_0_0;
/* 326 */
/* 327 */     hashAgg_bufIsNull_2 = false;
/* 328 */     hashAgg_bufValue_2 = hashAgg_value_31;
/* 329 */   }
/* 330 */
/* 331 */   private void hashAgg_doAggregate_count_0() throws java.io.IOException {
/* 332 */     long hashAgg_value_36 = -1L;
/* 333 */
/* 334 */     hashAgg_value_36 = hashAgg_bufValue_3 + 1L;
/* 335 */
/* 336 */     hashAgg_bufIsNull_3 = false;
/* 337 */     hashAgg_bufValue_3 = hashAgg_value_36;
/* 338 */   }
/* 339 */
/* 340 */   private void hashAgg_doAggregate_min_0() throws java.io.IOException {
/* 341 */     hashAgg_hashAgg_isNull_27_0 = true;
/* 342 */     double hashAgg_value_27 = -1.0;
/* 343 */
/* 344 */     if (!hashAgg_bufIsNull_0 && (hashAgg_hashAgg_isNull_27_0 ||
/* 345 */         (org.apache.spark.sql.catalyst.util.SQLOrderingUtil.compareDoubles(hashAgg_value_27, hashAgg_bufValue_0)) > 0)) {
/* 346 */       hashAgg_hashAgg_isNull_27_0 = false;
/* 347 */       hashAgg_value_27 = hashAgg_bufValue_0;
/* 348 */     }
/* 349 */
/* 350 */     if (!hashAgg_subExprIsNull_0 && (hashAgg_hashAgg_isNull_27_0 ||
/* 351 */         (org.apache.spark.sql.catalyst.util.SQLOrderingUtil.compareDoubles(hashAgg_value_27, hashAgg_subExprValue_1)) > 0)) {
/* 352 */       hashAgg_hashAgg_isNull_27_0 = false;
/* 353 */       hashAgg_value_27 = hashAgg_subExprValue_1;
/* 354 */     }
/* 355 */
/* 356 */     hashAgg_bufIsNull_0 = hashAgg_hashAgg_isNull_27_0;
/* 357 */     hashAgg_bufValue_0 = hashAgg_value_27;
/* 358 */   }
/* 359 */
/* 360 */   protected void processNext() throws java.io.IOException {
/* 361 */     while (!hashAgg_initAgg_0) {
/* 362 */       hashAgg_initAgg_0 = true;
/* 363 */
/* 364 */       long hashAgg_beforeAgg_0 = System.nanoTime();
/* 365 */       hashAgg_doAggregateWithoutKey_0();
/* 366 */       ((org.apache.spark.sql.execution.metric.SQLMetric) references[4] /* aggTime */).add((System.nanoTime() - hashAgg_beforeAgg_0) / 1000000);
/* 367 */
/* 368 */       // output the result
/* 369 */
/* 370 */       ((org.apache.spark.sql.execution.metric.SQLMetric) references[3] /* numOutputRows */).add(1);
/* 371 */       hashAgg_mutableStateArray_0[0].reset();
/* 372 */
/* 373 */       hashAgg_mutableStateArray_0[0].zeroOutNullBytes();
/* 374 */
/* 375 */       if (hashAgg_bufIsNull_0) {
/* 376 */         hashAgg_mutableStateArray_0[0].setNullAt(0);
/* 377 */       } else {
/* 378 */         hashAgg_mutableStateArray_0[0].write(0, hashAgg_bufValue_0);
/* 379 */       }
/* 380 */
/* 381 */       if (hashAgg_bufIsNull_1) {
/* 382 */         hashAgg_mutableStateArray_0[0].setNullAt(1);
/* 383 */       } else {
/* 384 */         hashAgg_mutableStateArray_0[0].write(1, hashAgg_bufValue_1);
/* 385 */       }
/* 386 */
/* 387 */       if (hashAgg_bufIsNull_2) {
/* 388 */         hashAgg_mutableStateArray_0[0].setNullAt(2);
/* 389 */       } else {
/* 390 */         hashAgg_mutableStateArray_0[0].write(2, hashAgg_bufValue_2);
/* 391 */       }
/* 392 */
/* 393 */       hashAgg_mutableStateArray_0[0].write(3, hashAgg_bufValue_3);
/* 394 */
/* 395 */       hashAgg_mutableStateArray_0[0].write(4, hashAgg_bufValue_4);
/* 396 */
/* 397 */       hashAgg_mutableStateArray_0[0].write(5, hashAgg_bufValue_5);
/* 398 */
/* 399 */       hashAgg_mutableStateArray_0[0].write(6, hashAgg_bufValue_6);
/* 400 */       append((hashAgg_mutableStateArray_0[0].getRow()));
/* 401 */     }
/* 402 */   }
/* 403 */
/* 404 */ }

To Reproduce Create project with Deequ 2.0.7 dependecy and run below code:

import com.amazon.deequ.profiles.ColumnProfilerRunner
import org.apache.spark.sql.SparkSession

import java.sql.Date

object DeequTest {

  def main(args: Array[String]): Unit = {

    val spark: SparkSession = SparkSession.builder()
      .appName("data-quality")
      .master("local")
      .getOrCreate()

    import spark.implicits._

    val testData = Seq(
      TestEvent(),
      TestEvent(),
      TestEvent(),
    ).toDF()

    val profiles = ColumnProfilerRunner()
      .onData(testData)
      .run()

  }

}

case class TestEvent(
                            evenId: String = "bc60b4ca-e331-11ed-b5ea-0242ac120002",
                            size: Int = 10,
                            createdDate: Date = Date.valueOf("2023-04-24")
                          )

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Add any other context about the problem here.

pawelpinkos commented 1 week ago

@rdsharma26 - could you please take a look at this? Probably your change is root cause of this. Thanks a lot!

rdsharma26 commented 1 week ago

Thanks @pawelpinkos for bringing this to our attention. The details are extremely helpful. We will investigate this and get back to you soon.