Open Indu-sharma opened 1 year ago
I figured out that issue is with analysis_runner.addAnalyzer(Uniqueness(column)) ie. Uniqueness rule.
And
This line of code has the issue.
return jvm.scala.collection.JavaConversions.iterableAsScalaIterable(iterable).toSeq()
Hmm, Uniqueness has been tested https://github.com/awslabs/python-deequ/blob/ef6e6afb2f1403d2528e9eb1bf67506754ec127d/tests/test_analyzers.py#L245. Could you take a look at the test case? Looks like you should pass a list of columns instead?
Describe the bug I'm trying to add analysers for the columns of a PySpark Data Frame, I'm getting py4j.protocol.Py4JError.
To Reproduce Steps to reproduce the behavior:
for column in df.columns: analysis_runner.addAnalyzer(Uniqueness(column)) analysis_runner.addAnalyzer(Completeness(column)) analysisResult= analysis_runner.run()
Run The Pyspark script to see following error:
analysis_runner.addAnalyzer(Uniqueness(column)) File "/Users/indusharma/Library/Python/3.9/lib/python/site-packages/pydeequ/analyzers.py", line 134, in addAnalyzer _analyzer_jvm = analyzer._analyzer_jvm File "/Users/indusharma/Library/Python/3.9/lib/python/site-packages/pydeequ/analyzers.py", line 775, in _analyzer_jvm to_scala_seq(self._jvm, self.columns), self._jvm.scala.Option.apply(self.where) File "/Users/indusharma/Library/Python/3.9/lib/python/site-packages/pydeequ/scala_utils.py", line 80, in to_scala_seq return jvm.scala.collection.JavaConversions.iterableAsScalaIterable(iterable).toSeq() File "/Users/indusharma/Library/Python/3.9/lib/python/site-packages/py4j/java_gateway.py", line 1322, in call return_value = get_return_value( File "/Users/indusharma/Library/Python/3.9/lib/python/site-packages/pyspark/errors/exceptions/captured.py", line 169, in deco return f(*a, **kw) File "/Users/indusharma/Library/Python/3.9/lib/python/site-packages/py4j/protocol.py", line 330, in get_return_value raise Py4JError( py4j.protocol.Py4JError: An error occurred while calling z:scala.collection.JavaConversions.iterableAsScalaIterable. Trace: py4j.Py4JException: Method iterableAsScalaIterable([class java.lang.String]) does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:321) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342) at py4j.Gateway.invoke(Gateway.java:276) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at py4j.ClientServerConnection.run(ClientServerConnection.java:106) at java.lang.Thread.run(Thread.java:750)
Expected behavior Analysers are run successfully.
Screenshots If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information): N/A
Smartphone (please complete the following information): N/A
Additional context