awslabs / deequ

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Apache License 2.0
3.32k stars 539 forks source link

[FEATURE] Add support for Spark 3.5 #507

Closed jhchee closed 9 months ago

jhchee commented 1 year ago

Spark 3.5 has just released today, it will be great to have this to be supported.

jklap commented 1 year ago

index 39545d2..8378c0b 100644
--- a/pom.xml
+++ b/pom.xml
@@ -6,7 +6,7 @@

     <groupId>com.amazon.deequ</groupId>
     <artifactId>deequ</artifactId>
-    <version>2.0.3-spark-3.3</version>
+    <version>2.0.3-spark-3.5.0</version>

     <properties>
         <maven.compiler.source>1.8</maven.compiler.source>
@@ -18,7 +18,7 @@
         <artifact.scala.version>${scala.major.version}</artifact.scala.version>
         <scala-maven-plugin.version>4.8.1</scala-maven-plugin.version>

-        <spark.version>3.3.0</spark.version>
+        <spark.version>3.5.0</spark.version>
     </properties>

     <name>deequ</name>
diff --git a/src/main/scala/com/amazon/deequ/analyzers/catalyst/StatefulHyperloglogPlus.scala b/src/main/scala/com/amazon/deequ/analyzers/catalyst/StatefulHyperloglogPlus.scala
index 52e175b..fe54f89 100644
--- a/src/main/scala/com/amazon/deequ/analyzers/catalyst/StatefulHyperloglogPlus.scala
+++ b/src/main/scala/com/amazon/deequ/analyzers/catalyst/StatefulHyperloglogPlus.scala
@@ -59,7 +59,7 @@ private[sql] case class StatefulHyperloglogPlus(

   override def dataType: DataType = BinaryType

-  override def aggBufferSchema: StructType = StructType.fromAttributes(aggBufferAttributes)
+  override def aggBufferSchema: StructType = StructType(aggBufferAttributes.map(a => StructField(a.name, a.dataType, a.nullable, a.metadata)))

   /** Allocate enough words to store all registers. */
   override val aggBufferAttributes: Seq[AttributeReference] = Seq.tabulate(NUM_WORDS) { i =>```