Closed zeotuan closed 2 months ago
Hi @rdsharma26, what do you think about updating Breeze version? I wonder if there are other workaround to make Anomaly Detection works on more modern version of spark?
The change looks good. Let me get back to you after understanding how this change affects our internal Spark 3.3 / 3.1 branches.
Hi @rdsharma26, I just want to check the status of this. Are there any things I can help with (testing 3.3, 3.1, etc.)
@zeotuan Apologies for the delayed response. Would it be possible for you to check how this change works against the 2.0.0-spark-3.1-minor
and spark-3.3
branches? Does mvn clean install
work when you cherry pick these changes on to those branches?
Hi @rdsharma26, breeze 2.1.0 is not compatible with spark-3.3
and 2.0.0-spark-3.1-minor
spark 3.3 rely on breeze 1.2
spark 3.1 rely on breeze 1.0
Updating to these versions on those image works.
Maybe that would require separate PR to fix anomaly detection issue on those versions.
I think this would fix PyDeequ's upgrade to PySpark 3.4 as well, see errors related to breeze here https://github.com/awslabs/python-deequ/actions/runs/8886301683/job/24399475419?pr=203
E py4j.protocol.Py4JJavaError: An error occurred while calling o238.run.
E : java.lang.NoSuchMethodError: 'breeze.generic.UFunc$UImpl2 breeze.linalg.DenseVector$.canSubD()'
E at com.amazon.deequ.anomalydetection.BaseChangeStrategy.diff(BaseChangeStrategy.scala:65)
E at com.amazon.deequ.anomalydetection.BaseChangeStrategy.diff$(BaseChangeStrategy.scala:58)
E at com.amazon.deequ.anomalydetection.AbsoluteChangeStrategy.diff(AbsoluteChangeStrategy.scala:33)
E at com.amazon.deequ.anomalydetection.BaseChangeStrategy.detect(BaseChangeStrategy.scala:90)
E at com.amazon.deequ.anomalydetection.BaseChangeStrategy.detect$(BaseChangeStrategy.scala:80)
E at com.amazon.deequ.anomalydetection.AbsoluteChangeStrategy.detect(AbsoluteChangeStrategy.scala:33)
E at com.amazon.deequ.anomalydetection.AnomalyDetector.detectAnomaliesInHistory(AnomalyDetector.scala:98)
E at com.amazon.deequ.anomalydetection.AnomalyDetector.isNewPointAnomalous(AnomalyDetector.scala:60)
E at com.amazon.deequ.checks.Check$.isNewestPointNonAnomalous(Check.scala:1354)
E at com.amazon.deequ.checks.Check.$anonfun$isNewestPointNonAnomalous$1(Check.scala:583)
E at scala.runtime.java8.JFunction1$mcZD$sp.apply(JFunction1$mcZD$sp.java:23)
E at com.amazon.deequ.constraints.AnalysisBasedConstraint.runAssertion(AnalysisBasedConstraint.scala:108)
E at com.amazon.deequ.constraints.AnalysisBasedConstraint.pickValueAndAssert(AnalysisBasedConstraint.scala:74)
E at com.amazon.deequ.constraints.AnalysisBasedConstraint.$anonfun$evaluate$2(AnalysisBasedConstraint.scala:60)
E at scala.Option.map(Option.scala:230)
E at com.amazon.deequ.constraints.AnalysisBasedConstraint.evaluate(AnalysisBasedConstraint.scala:60)
E at com.amazon.deequ.constraints.ConstraintDecorator.evaluate(Constraint.scala:60)
E at com.amazon.deequ.checks.Check.$anonfun$evaluate$1(Check.scala:1246)
E at scala.collection.immutable.List.map(List.scala:293)
E at com.amazon.deequ.checks.Check.evaluate(Check.scala:1246)
E at com.amazon.deequ.VerificationSuite.$anonfun$evaluate$1(VerificationSuite.scala:269)
E at scala.collection.immutable.List.map(List.scala:293)
E at com.amazon.deequ.VerificationSuite.evaluate(VerificationSuite.scala:269)
E at com.amazon.deequ.VerificationSuite.doVerificationRun(VerificationSuite.scala:132)
E at com.amazon.deequ.VerificationRunBuilder.run(VerificationRunBuilder.scala:172)
E at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
E at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
E at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
E at java.base/java.lang.reflect.Method.invoke(Method.java:568)
E at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
E at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
E at py4j.Gateway.invoke(Gateway.java:282)
E at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
E at py4j.commands.CallCommand.execute(CallCommand.java:79)
E at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
E at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
E at java.base/java.lang.Thread.run(Thread.java:840)
Update breeze version to 2.1 to match with current spark-mlib 3.4 and spark-mlib 3.5 breeze dependency version. This would allow people migrating to spark 3.4+ to use anomaly detection without dependency conflict issue that is mentioned in https://github.com/awslabs/deequ/issues/336 https://github.com/awslabs/deequ/issues/393 https://github.com/awslabs/deequ/issues/428 https://github.com/awslabs/deequ/issues/428 Also Breeze 0.13.2 has several security vulnerabilities which was solve in breeze 2.1.0
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.