awslabs / python-deequ

Python API for Deequ
Apache License 2.0
669 stars 131 forks source link

PySpark Error #194

Closed archi-2001 closed 3 months ago

archi-2001 commented 3 months ago

py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ (in unnamed module @0x65f095f8) cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module @0x65f095f8

To Reproduce Steps to reproduce the behavior:

  1. Go to VSCode
  2. Type:

from pyspark.sql import SparkSession, Row import pydeequ spark = (SparkSession .builder .config("spark.jars.packages", pydeequ.deequ_maven_coord) .config("spark.jars.excludes", pydeequ.f2j_maven_coord) .getOrCreate()) df = spark.sparkContext.parallelize([ Row(a="foo", b=1, c=5), Row(a="bar", b=2, c=6), Row(a="baz", b=3, c=None)]).toDF()

  1. Run python file
  2. See error

Screenshots image

Version information:

chintanrabadiya commented 3 months ago

how you connect your spark connection? mean using docker or something else. I have same issue which solve after when connect it properly

chenliu0831 commented 3 months ago

See if this helps https://stackoverflow.com/questions/73465937/apache-spark-3-3-0-breaks-on-java-17-with-cannot-access-class-sun-nio-ch-direct. But otherwise, we don't support Spark setup issues unless you are sure it's unique to Deequ/PyDeequ setup.