Open 0xbadidea opened 2 months ago
@0xbadidea I think the root of the problem is there and I do not see a way how it can be fixed because this py4j.reflection.PythonProxyHandler
is not serializable at all (a good explanation in the linked py4j
discussion). I tried to add scala.Serializable
to list of classes that are implemented in scala_utils.ScalaFunction1
but it did not help because the problem is in py4j
itself.
Thanks @SemyonSinchenko! Appreciate your help. Guess there's no straightforrward way to implement this method in pydeequ.
I'm not reporting a bug, just looking for a workaround and I'm hoping someone can help!
I'm trying to call deequ's rowLevelResultsAsDataFrame function from pydeequ. Things work fine but as soon as I add a lambda function to the Verification Suite checks, I start getting serialization errors.
pyspark==3.5 pydeequ==1.4.0 jar: com.amazon.deequ:deequ:2.0.7-spark-3.5
A working snippet is provided below. The code runs fine if we remove the check
hasMin("b", lambda x: x == 0)
.Can someone please provide a workaround?
Here's the full stack trace for the error: