Open Jrecos opened 2 years ago
I have the same issue, but with EMR (emr-6.2.0) running in docker mode using amazoncorretto:8 image and adding pydeequ on the Dockerfile. If I use Completeness and Compliance analyzers works ok. But if I add a Uniqueness analyzer I got the same error as you.
I have the same issue with pydeequ (v1.0.1) on Databricks (DBR 10.4 LTS). However, when working in Databricks with Scala with Deequ v2.0.1-spark-3.2 the Uniqueness Analyser works as expected. This means the issue is most probably related to the python bridge.
Isn't the problem that uniqueness, uniquevalueratio, approxquantiles, distinctness... all require a list of columns as input, and you're inputting a single column?
This solved the problem for me.
Isn't the problem that uniqueness, uniquevalueratio, approxquantiles, distinctness... all require a list of columns as input, and you're inputting a single column?
This solved the problem for me.
This has solved it for me. Submitting a list of length 1 for checking a single column also works
Describe the bug Hello,
I'm using Databricks and pydeequ to build a QA step in structured streaming. One of the Analyzers that I need to use is the Uniqueness. If I try to add another one like Completeness, work properly, but if y add the Uniqueness I get an error:
py4j.Py4JException: Method iterableAsScalaIterable([class java.lang.String]) does not exist
Log:
To Reproduce I'm using the example provided on the main page:
I'm using this version of: Databricks:
pydeequ:
java:
Thanks!