Closed ol-eg closed 3 years ago
Hi! Apologies for the late reply amidst the holidays, but my guess is that your spark session is unable to access the deequ-1.0.3.jar
. We leverage ivy
to download the jar from maven, so perhaps there is a disconnect of where those jars are stored between docker and your main machine. From the looks of the screenshot, you listed your jars in /usr/local/spark/jars/
whereas ivy downloaded deequ-1.0.3.jar
into /home/jovyan/.ivy2/jars/
.
Does your notebook work with just running the addAnalyzer(Size())
? That would reinforce the fact that the sparksession is unable to access the deequ jar.
Also, we have only developed and supported up to deequ-1.0.3
, so please stick to that version!
Hi, thx for coming back. That is I thinks the issue:
I think for me the fix will be to find the image with compatible spark/scala versions, and configure pyspark with extra ivy2 path.
thx vm.
Describe the bug Trying the lib on jupyter notebook, the example from quickstart. When trying to compute Uniqueness or UniqueValueRation facing java exception 'Method iterableAsScalaIterable([class java.lang.String]) does not exist'
To Reproduce Steps to reproduce the behaviour:
$ docker run -p 8888:8888 jupyter/all-spark-notebook:a0a544e6dc6e
!pip install pydeequ
Screenshots
Additional context I have tried few different combinations of spark/scala/amazon deequ libs versions but did not manage to make this work.