Closed malhomaid closed 2 months ago
Are you using Dataproc? If so, then both Spark 3.5 offerings (image 2.2, Serverless runtime 2.2) the connector is built in the image, so you don't need to provide it. My guess this is part of the problem. You can change the version or the jar of the connector as explained here.
The other issue is that some of the classes you provide alreadt exist in spark (like avro) or interfere with other spark dependencies (like guava). I'd recommend to create a shaded jar containing only the dependencies whose spark version you don't want to use (guava is a good candidate for that).
@davidrabinowitz Thanks I didn't provide the connector jar and it worked 👍
Hello,
I'm using the connector in pyspark and I'm facing this error:
The command I used below(I copied the jars using
mvn dependency:copy-dependencies
then specified all the jars, not sure if there is a better way I used a fat jar but it was not reading Kafka classes).gcloud dataproc jobs submit pyspark --project systems-staging-ce59 --cluster dataproc-cluster-6c3 --region me-central2 --jars target/dependency/abris_2.12-6.4.0.jar,target/dependency/avro-1.10.1.jar,target/dependency/checker-qual-3.8.0.jar,target/dependency/common-utils-6.2.1.jar,target/dependency/commons-compress-1.21.jar,target/dependency/commons-lang3-3.2.1.jar,target/dependency/commons-logging-1.1.3.jar,target/dependency/commons-pool2-2.11.1.jar,target/dependency/commons_2.12-1.0.0.jar,target/dependency/error_prone_annotations-2.5.1.jar,target/dependency/failureaccess-1.0.1.jar,target/dependency/guava-30.1.1-jre.jar,target/dependency/hadoop-client-api-3.3.4.jar,target/dependency/hadoop-client-runtime-3.3.4.jar,target/dependency/j2objc-annotations-1.3.jar,target/dependency/jackson-annotations-2.10.5.jar,target/dependency/jackson-core-2.11.3.jar,target/dependency/jackson-databind-2.10.5.1.jar,target/dependency/jackson-dataformat-yaml-2.11.1.jar,target/dependency/jakarta.annotation-api-1.3.5.jar,target/dependency/jakarta.inject-2.6.1.jar,target/dependency/jakarta.ws.rs-api-2.1.6.jar,target/dependency/jersey-common-2.34.jar,target/dependency/jsr305-3.0.0.jar,target/dependency/kafka-avro-serializer-6.2.1.jar,target/dependency/kafka-clients-3.4.1.jar,target/dependency/kafka-schema-registry-client-6.2.1.jar,target/dependency/kafka-schema-serializer-6.2.1.jar,target/dependency/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar,target/dependency/lz4-java-1.8.0.jar,target/dependency/osgi-resource-locator-1.0.3.jar,target/dependency/scala-library-2.12.15.jar,target/dependency/slf4j-api-1.7.36.jar,target/dependency/snakeyaml-1.26.jar,target/dependency/snappy-java-1.1.8.4.jar,target/dependency/spark-avro_2.12-3.5.0.jar,target/dependency/spark-bigquery-with-dependencies_2.12-0.37.0.jar,target/dependency/spark-sql-kafka-0-10_2.12-3.5.0.jar,target/dependency/spark-tags_2.12-3.5.0.jar,target/dependency/spark-token-provider-kafka-0-10_2.12-3.5.0.jar,target/dependency/swagger-annotations-1.6.2.jar,target/dependency/swagger-core-1.6.2.jar,target/dependency/swagger-models-1.6.2.jar,target/dependency/xz-1.9.jar --py-files abris.py kafka-to-bigquery.py
Maven pom.xml:
Dataproc version:
2.2.10-debian12
Spark version:3.5.0
Scala version:2.12.18