AbsaOSS / ABRiS

Avro SerDe for Apache Spark structured APIs.
Apache License 2.0
229 stars 75 forks source link

confluent libraries not found #134

Closed geoHeil closed 4 years ago

geoHeil commented 4 years ago

On spark I cannot load this great library in both spark 2.x and 3.x:


/usr/local/Cellar/apache-spark/3.0.0/libexec/bin/spark-shell --master 'local[4]'\
    --packages org.apache.spark:spark-avro_2.12:3.0.0,org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0,za.co.absa:abris_2.12:3.2.1 \
    --conf spark.sql.shuffle.partitions=4

/path/to/spark-2.4.6-bin-hadoop2.7/bin/spark-shell --master 'local[4]'\
    --packages org.apache.spark:spark-avro_2.11:2.4.6,org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.6,za.co.absa:abris_2.11:3.2.1 \
    --conf spark.sql.shuffle.partitions=4

::::::::::::::::::::::::::::::::::::::::::::::

        ::          UNRESOLVED DEPENDENCIES         ::

        ::::::::::::::::::::::::::::::::::::::::::::::

        :: io.confluent#kafka-avro-serializer;5.3.1: not found

        :: io.confluent#kafka-schema-registry-client;5.3.1: not found

        :: io.confluent#common-config;5.3.1: not found

        :: io.confluent#common-utils;5.3.1: not found

        ::::::::::::::::::::::::::::::::::::::::::::::
geoHeil commented 4 years ago

The additional: repositories https://packages.confluent.io/maven flag needs to be specified as the jars are only available in Confluent's proprietary repositories

/usr/local/Cellar/apache-spark/3.0.0/libexec/bin/spark-shell --master 'local[4]'\
    --repositories https://packages.confluent.io/maven \
    --packages org.apache.spark:spark-avro_2.12:3.0.0,org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0,za.co.absa:abris_2.12:3.2.1 \
    --conf spark.sql.shuffle.partitions=4
GopinathMC commented 3 years ago

Hi @geoHeil

When I am trying to pass the similar command like you with latest ABRiS package(za.co.absa:abris_2.11:4.0.0) receiving the below error.

_bin/spark-shell \ --repositories https://packages.confluent.io/maven \ --packages org.apache.spark:spark-avro_2.12:3.0.0,org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0,za.co.absa:abris2.11:4.0.0 \ --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'

Error message: :: problems summary :: :::: WARNINGS [NOT FOUND ] javax.ws.rs#javax.ws.rs-api;2.1.1!javax.ws.rs-api.${packaging.type} (895ms)

==== central: tried

  https://repo1.maven.org/maven2/javax/ws/rs/javax.ws.rs-api/2.1.1/javax.ws.rs-api-2.1.1.${packaging.type}

    ::::::::::::::::::::::::::::::::::::::::::::::

    ::              FAILED DOWNLOADS            ::

    :: ^ see resolution messages for details  ^ ::

    ::::::::::::::::::::::::::::::::::::::::::::::

    :: javax.ws.rs#javax.ws.rs-api;2.1.1!javax.ws.rs-api.${packaging.type}

    ::::::::::::::::::::::::::::::::::::::::::::::

Could you please help me on this. Thanks in Advance!!

bama-chi commented 2 years ago

Hello @GopinathMC I'm facing the same exact problem here, with the [NOT FOUND ] javax.ws.rs#javax.ws.rs-api;2.1.1!javax.ws.rs-api.${packaging.type} (895ms). So I'm wondering did you solve the problem. thanks in advance.

kevinwallimann commented 2 years ago

Hi @bama-chi I could reproduce the error message when I didn't provide --repositories https://packages.confluent.io/maven \. Please make sure you don't forget --repositories https://packages.confluent.io/maven \ in your spark-shell command.

If you are just getting started with ABRiS, I'd suggest you use our latest versions, 5.1.1 for Spark 2.4.x and 6.2.0 for Spark 3

bama-chi commented 2 years ago

Hello @kevinwallimann , thanks for ur response. I'm passing the repositories as you mentionned, but facing some other problems (below), here's is my spark-options

SPARK_OPTS = {
    "app_name": "xxx",
    "master": "yarn",
    "jars": "sqljdbc41.jar",
    "spark.sql.caseSensitve": "false",
    "spark.executor.memory": "3g",
    "spark.driver.memory": "3g",
    "spark.sql.caseSensitive": "true",
    "kafka_streaming": True,
    "spark.jars":os.path.join(dag_path,"abris_2.11-5.1.1.jar"),
    "spark.jars.packages":"io.confluent:kafka-schema-registry-client:5.3.4,io.confluent:kafka-avro-serializer:5.3.4",
    "spark.jars.repositories":"https://packages.confluent.io/maven"
}
Exception in thread "main" java.lang.RuntimeException: [download failed: org.apache.zookeeper#zookeeper;3.4.14!zookeeper.jar, download failed: com.google.code.findbugs#jsr305;3.0.2!jsr305.jar, download failed: org.apache.yetus#audience-annotations;0.5.0!audience-annotations.jar, download failed: io.netty#netty;3.10.6.Final!netty.jar(bundle), download failed: org.apache.avro#avro;1.8.1!avro.jar(bundle), download failed: com.thoughtworks.paranamer#paranamer;2.7!paranamer.jar(bundle), download failed: org.xerial.snappy#snappy-java;1.1.1.3!snappy-java.jar(bundle), download failed: com.fasterxml.jackson.core#jackson-databind;2.9.10.5!jackson-databind.jar(bundle), download failed: com.fasterxml.jackson.core#jackson-annotations;2.9.10!jackson-annotations.jar(bundle), download failed: com.fasterxml.jackson.core#jackson-core;2.9.10!jackson-core.jar(bundle)]

I'm using spark 2.4 with scala 2.11 btw thanks in advance

bama-chi commented 2 years ago

Hi again, I changed the version for confluent from 5.3.4 to 5.5.2 and it worked, but now I'm facing another problem related to this issue https://github.com/AbsaOSS/ABRiS/issues/165#issue-737179648

I tried to change the avro version to 1.9.2 without any result

kevinwallimann commented 2 years ago

Confluent 5.4.+ only works with Spark 3.2.+ If you use Spark 2.4, you have to use Confluent 5.3. As for the dependencies, only spark-avro, spark-sql-kafka and abris should be required. Does the following work for you?

spark-shell --master 'local' \
--repositories https://packages.confluent.io/maven \
--packages org.apache.spark:spark-avro_2.11:2.4.8,org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.8,za.co.absa:abris_2.11:5.1.1 \
--class za.co.absa.abris.examples.ConfluentKafkaAvroReader \
~/.m2/repository/za/co/absa/abris_2.11/5.1.1/abris_2.11-5.1.1.jar
bama-chi commented 2 years ago

This doesn't work neither :/ I got this stack trace below,

    ::::::::::::::::::::::::::::::::::::::::::::::

    ::              FAILED DOWNLOADS            ::

    :: ^ see resolution messages for details  ^ ::

    ::::::::::::::::::::::::::::::::::::::::::::::

    :: org.apache.avro#avro;1.8.1!avro.jar(bundle)

    :: com.thoughtworks.paranamer#paranamer;2.7!paranamer.jar(bundle)

    :: org.apache.zookeeper#zookeeper;3.4.14!zookeeper.jar

    :: com.google.code.findbugs#jsr305;3.0.2!jsr305.jar

    :: org.apache.yetus#audience-annotations;0.5.0!audience-annotations.jar

    :: io.netty#netty;3.10.6.Final!netty.jar(bundle)

    :: com.fasterxml.jackson.core#jackson-databind;2.9.10.5!jackson-databind.jar(bundle)

    :: com.fasterxml.jackson.core#jackson-annotations;2.9.10!jackson-annotations.jar(bundle)

    :: com.fasterxml.jackson.core#jackson-core;2.9.10!jackson-core.jar(bundle)

    ::::::::::::::::::::::::::::::::::::::::::::::
:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [download failed: org.apache.avro#avro;1.8.1!avro.jar(bundle), download failed: com.thoughtworks.paranamer#paranamer;2.7!paranamer.jar(bundle), download failed: org.apache.zookeeper#zookeeper;3.4.14!zookeeper.jar, download failed: com.google.code.findbugs#jsr305;3.0.2!jsr305.jar, download failed: org.apache.yetus#audience-annotations;0.5.0!audience-annotations.jar, download failed: io.netty#netty;3.10.6.Final!netty.jar(bundle), download failed: com.fasterxml.jackson.core#jackson-databind;2.9.10.5!jackson-databind.jar(bundle), download failed: com.fasterxml.jackson.core#jackson-annotations;2.9.10!jackson-annotations.jar(bundle), download failed: com.fasterxml.jackson.core#jackson-core;2.9.10!jackson-core.jar(bundle)]
kevinwallimann commented 2 years ago

Ok, so it doesn't look like it's an abris specific problem at this point. Maybe you could use the verbose or debug message level to get more details on why those downloads fail