awslabs / python-deequ

Python API for Deequ
Apache License 2.0
713 stars 134 forks source link

Deequ error for Uniqueness analyzer (iterableAsScalaIterable) #73

Open Shijin1387 opened 3 years ago

Shijin1387 commented 3 years ago

Describe the bug Trying few example from pydeequ quickstart. When trying to compute Uniqueness facing java exception 'Method iterableAsScalaIterable([class java.lang.String]) does not exist'

To Reproduce Steps to reproduce the behaviour:

Start Jupyter from Pyspark shell All examples from basic tutorial works however adding analyser .addAnalyzer(Uniqueness("b")) Brings the error. (screenshots attached). Screenshots image

This is working image This is not working image

Error: Py4JError: An error occurred while calling z:scala.collection.JavaConversions.iterableAsScalaIterable. Trace: py4j.Py4JException: Method iterableAsScalaIterable([class java.lang.String]) does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:339) at py4j.Gateway.invoke(Gateway.java:276) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748)

Additional context Tried a lot of deequ jar options but this deequ-1.1.0_spark-3.0-scala-2.12.jar worked for all cases except the one described above

JonatanPolanco commented 2 years ago

Same error here! and no idea how to fix it

TPOTD commented 2 years ago

@JonatanPolanco If you still need help, Uniqueness takes a list of columns (took it from documentation), so try ['column_name'] instead of just 'column_name'. It worked for me image