awslabs / python-deequ

Python API for Deequ
Apache License 2.0
713 stars 134 forks source link

Py4JJavaError: An error occurred while calling None.com.amazon.deequ.analyzers.Completeness. : java.lang.NoClassDefFoundError: scala/Product$class #197

Open Sarvoch opened 5 months ago

Sarvoch commented 5 months ago

I am using the following version in jupyter notebook: pyspark, spark - 3.5.1 scala - 2.11.12 java -8

OS - windows 11

spark = (SparkSession .builder .config("spark.jars.packages", pydeequ.deequ_maven_coord) .config("spark.jars.excludes", pydeequ.f2j_maven_coord) .getOrCreate())

Getting the error while running the below code:

analysisResult = AnalysisRunner(spark) \ .onData(df) \ .addAnalyzer(Completeness("id")) \ .run() analysisResult_df = AnalyzerContext.successMetricsAsDataFrame(spark, analysisResult) analysisResult_df.show()

Py4JJavaError Traceback (most recent call last) Cell In[146], line 3 1 analysisResult = AnalysisRunner(spark) \ 2 .onData(df) \ ----> 3 .addAnalyzer(Completeness("id")) \ 4 .run() 6 analysisResult_df = AnalyzerContext.successMetricsAsDataFrame(spark, analysisResult) 7 analysisResult_df.show()

File ~\AppData\Roaming\Python\Python312\site-packages\pydeequ\analyzers.py:134, in AnalysisRunBuilder.addAnalyzer(self, analyzer) 127 """ 128 Adds a single analyzer to the current Analyzer run. 129 130 :param analyzer: Adds an analyzer strategy to the run. 131 :return self: for further chained method calls. 132 """ 133 analyzer._set_jvm(self._jvm) --> 134 _analyzer_jvm = analyzer._analyzer_jvm 135 self._AnalysisRunBuilder.addAnalyzer(_analyzer_jvm) 136 return self

File ~\AppData\Roaming\Python\Python312\site-packages\pydeequ\analyzers.py:274, in Completeness._analyzer_jvm(self) 268 @property 269 def _analyzer_jvm(self): 270 """Returns the value of the computed completeness 271 272 :return self: access the value of the Completeness analyzer. 273 """ --> 274 return self._deequAnalyzers.Completeness(self.column, self._jvm.scala.Option.apply(self.where))

File ~\AppData\Roaming\Python\Python312\site-packages\py4j\java_gateway.py:1585, in call(self, *args) 0 <Error retrieving source code with stack_data see ipython/ipython#13598>

File ~\AppData\Roaming\Python\Python312\site-packages\pyspark\sql\utils.py:190, in deco(*a, kw) 188 return getattr(functions, f.name)(*args, *kwargs) 189 else: --> 190 return f(args, kwargs)

File ~\AppData\Roaming\Python\Python312\site-packages\py4j\protocol.py:326, in get_return_value(answer, gateway_client, target_id, name) 324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client) 325 if answer[1] == REFERENCE_TYPE: --> 326 raise Py4JJavaError( 327 "An error occurred while calling {0}{1}{2}.\n". 328 format(target_id, ".", name), value) 329 else: 330 raise Py4JError( 331 "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n". 332 format(target_id, ".", name, value))

Py4JJavaError: An error occurred while calling None.com.amazon.deequ.analyzers.Completeness. : java.lang.NoClassDefFoundError: scala/Product$class at com.amazon.deequ.analyzers.Completeness.(Completeness.scala:27) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374) at py4j.Gateway.invoke(Gateway.java:238) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at py4j.ClientServerConnection.run(ClientServerConnection.java:106) at java.lang.Thread.run(Thread.java:748)