Closed alinesacchetti closed 7 months ago
Try this workaround - https://github.com/awslabs/python-deequ/issues/138#issuecomment-1611575546?
Hi! I tried to use this commands, but still have the same issue:
The weird part is that it doesn't matter which component I try to use, the error is always the same: "TypeError: 'JavaPackage' object is not callable"
On Spark clusters, you probably will have better luck putting the Deequ jar to the Spark runtime jars library path / class path. We don't have a DB environment but you could probably follow this post https://aws.amazon.com/blogs/big-data/monitor-data-quality-in-your-data-lake-using-pydeequ-and-aws-glue/. You can download the Deequ jar from https://mvnrepository.com/artifact/com.amazon.deequ/deequ/2.0.4-spark-3.3
Closing - feel free to re-open if you need more help.
Describe the bug Currently our organization is trying to use PyDeequ libraries along with the Databricks which is using Apache Spark 3.3.2. When we try to call any function from pydeequ (AnalysisRunner, ColumnProfilerRunner, ConstraintSuggestionRunner, Check) we get the error "TypeError: 'JavaPackage' object is not callable"
To Reproduce Steps to reproduce the behavior:
and spark conf:
spark.driver.extraJavaOptions "-Dlog4j2.formatMsgNoLookups=true" spark.databricks.optimizer.adaptive.enabled true spark.databricks.delta.preview.enabled true spark.sql.adaptive.coalescePartitions.enabled true spark.sql.sources.partitionOverwriteMode dynamic spark.sql.adaptive.skewJoin.enabled true spark.databricks.unity.catalog.enable false spark.sql.execution.arrow.enabled true spark.executor.extraJavaOptions "-Dlog4j2.formatMsgNoLookups=true"
Expected behavior We hope that we can validate our data by cheking the last version of our data in an pyspark dataframe
Screenshots
Desktop (please complete the following information):