Closed soumilshah1995 closed 3 weeks ago
tried following
os.environ["JAVA_HOME"] = "/opt/homebrew/opt/openjdk@11"
AWS_SDK_DEPENDENCIES = "com.amazonaws:dynamodb-lock-client:1.2.0,com.amazonaws:aws-java-sdk-dynamodb:1.12.735,com.amazonaws:aws-java-sdk-core:1.12.735"
SUBMIT_ARGS = f"--packages org.apache.hudi:hudi-spark{SPARK_VERSION}-bundle_2.12:{HUDI_VERSION},{AWS_SDK_DEPENDENCIES} pyspark-shell"
os.environ["PYSPARK_SUBMIT_ARGS"] = SUBMIT_ARGS
os.environ['PYSPARK_PYTHON'] = sys.executable
spark = SparkSession.builder \
.config('spark.serializer', 'org.apache.spark.serializer.KryoSerializer') \
.config('spark.sql.extensions', 'org.apache.spark.sql.hudi.HoodieSparkSessionExtension') \
.config('className', 'org.apache.hudi') \
.config('spark.sql.hive.convertMetastoreParquet', 'false') \
.getOrCreate()
looks like there is class missing not sure which one
4j.protocol.Py4JJavaError: An error occurred while calling o65.save.
: org.apache.hudi.exception.HoodieException: Unable to load class
at org.apache.hudi.common.util.ReflectionUtils.getClass(ReflectionUtils.java:58)
at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:79)
@soumilshah1995 you need to add hudi-aws bundle also in the dependencies.
oh let me try this and update the thread shortly
HUDI_VERSION = '0.14.0'
SPARK_VERSION = '3.4'
os.environ["JAVA_HOME"] = "/opt/homebrew/opt/openjdk@11"
AWS_JAR_FILES = f"org.apache.hudi:hudi-aws:{HUDI_VERSION},org.apache.hudi:hudi-aws-bundle:{HUDI_VERSION}"
SUBMIT_ARGS = f"--packages org.apache.hudi:hudi-spark3.4.1-bundle_2.12:{HUDI_VERSION},{AWS_JAR_FILES} pyspark-shell"
os.environ["PYSPARK_SUBMIT_ARGS"] = SUBMIT_ARGS
os.environ['PYSPARK_PYTHON'] = sys.executable
spark = SparkSession.builder \
.config('spark.serializer', 'org.apache.spark.serializer.KryoSerializer') \
.config('spark.sql.extensions', 'org.apache.spark.sql.hudi.HoodieSparkSessionExtension') \
.config('className', 'org.apache.hudi') \
.config('spark.sql.hive.convertMetastoreParquet', 'false') \
.getOrCreate()
python3 w1.py
Imports loaded successfully.
Warning: Ignoring non-Spark config property: className
:: loading settings :: url = jar:file:/opt/anaconda3/lib/python3.11/site-packages/pyspark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /Users/soumilshah/.ivy2/cache
The jars for the packages stored in: /Users/soumilshah/.ivy2/jars
org.apache.hudi#hudi-spark3.4.1-bundle_2.12 added as a dependency
org.apache.hudi#hudi-aws added as a dependency
org.apache.hudi#hudi-aws-bundle added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-9c6c8274-f28f-4a73-b9e9-c27219acefce;1.0
confs: [default]
found org.apache.hudi#hudi-aws;0.14.0 in central
found org.apache.hudi#hudi-common;0.14.0 in central
found org.openjdk.jol#jol-core;0.16 in local-m2-cache
found com.fasterxml.jackson.core#jackson-annotations;2.10.0 in local-m2-cache
found com.fasterxml.jackson.core#jackson-databind;2.10.0 in local-m2-cache
found com.fasterxml.jackson.core#jackson-core;2.10.0 in local-m2-cache
found com.fasterxml.jackson.datatype#jackson-datatype-jsr310;2.10.0 in local-m2-cache
found com.github.ben-manes.caffeine#caffeine;2.9.1 in local-m2-cache
found org.checkerframework#checker-qual;3.10.0 in local-m2-cache
found com.google.errorprone#error_prone_annotations;2.5.1 in local-m2-cache
found org.apache.orc#orc-core;1.6.0 in local-m2-cache
found org.apache.orc#orc-shims;1.6.0 in local-m2-cache
found org.slf4j#slf4j-api;1.7.36 in local-m2-cache
found com.google.protobuf#protobuf-java;3.21.7 in local-m2-cache
found commons-lang#commons-lang;2.6 in local-m2-cache
found io.airlift#aircompressor;0.15 in local-m2-cache
found javax.xml.bind#jaxb-api;2.2.11 in local-m2-cache
found org.apache.hive#hive-storage-api;2.6.0 in local-m2-cache
found org.jetbrains#annotations;17.0.0 in local-m2-cache
found org.roaringbitmap#RoaringBitmap;0.9.47 in local-m2-cache
found org.apache.httpcomponents#fluent-hc;4.4.1 in local-m2-cache
found commons-logging#commons-logging;1.2 in local-m2-cache
found org.rocksdb#rocksdbjni;7.5.3 in local-m2-cache
found org.apache.hbase#hbase-client;2.4.9 in local-m2-cache
found org.apache.hbase.thirdparty#hbase-shaded-protobuf;3.5.1 in local-m2-cache
found org.apache.hbase#hbase-protocol-shaded;2.4.9 in local-m2-cache
found org.apache.yetus#audience-annotations;0.5.0 in local-m2-cache
found org.apache.hbase#hbase-protocol;2.4.9 in local-m2-cache
found javax.annotation#javax.annotation-api;1.2 in local-m2-cache
found commons-codec#commons-codec;1.13 in local-m2-cache
found commons-io#commons-io;2.11.0 in local-m2-cache
found org.apache.commons#commons-lang3;3.9 in local-m2-cache
found org.apache.hbase.thirdparty#hbase-shaded-miscellaneous;3.5.1 in local-m2-cache
found com.google.errorprone#error_prone_annotations;2.7.1 in local-m2-cache
found org.apache.hbase.thirdparty#hbase-shaded-netty;3.5.1 in local-m2-cache
found org.apache.zookeeper#zookeeper;3.5.7 in local-m2-cache
found org.apache.zookeeper#zookeeper-jute;3.5.7 in local-m2-cache
found io.netty#netty-handler;4.1.45.Final in local-m2-cache
found io.netty#netty-common;4.1.45.Final in local-m2-cache
found io.netty#netty-buffer;4.1.45.Final in local-m2-cache
found io.netty#netty-transport;4.1.45.Final in local-m2-cache
found io.netty#netty-resolver;4.1.45.Final in local-m2-cache
found io.netty#netty-codec;4.1.45.Final in local-m2-cache
found io.netty#netty-transport-native-epoll;4.1.45.Final in local-m2-cache
found io.netty#netty-transport-native-unix-common;4.1.45.Final in local-m2-cache
found org.apache.htrace#htrace-core4;4.2.0-incubating in local-m2-cache
found org.jruby.jcodings#jcodings;1.0.55 in local-m2-cache
found org.jruby.joni#joni;2.1.31 in local-m2-cache
found io.dropwizard.metrics#metrics-core;4.1.1 in local-m2-cache
found org.apache.commons#commons-crypto;1.0.0 in local-m2-cache
found org.apache.hadoop#hadoop-auth;2.10.1 in central
found com.nimbusds#nimbus-jose-jwt;7.9 in local-m2-cache
found com.github.stephenc.jcip#jcip-annotations;1.0-1 in local-m2-cache
found org.apache.directory.server#apacheds-kerberos-codec;2.0.0-M15 in local-m2-cache
found org.apache.directory.server#apacheds-i18n;2.0.0-M15 in local-m2-cache
found org.apache.directory.api#api-asn1-api;1.0.0-M20 in local-m2-cache
found org.apache.directory.api#api-util;1.0.0-M20 in local-m2-cache
found org.apache.curator#curator-framework;2.7.1 in local-m2-cache
found org.apache.curator#curator-client;2.7.1 in local-m2-cache
found org.apache.hbase#hbase-server;2.4.9 in local-m2-cache
found org.apache.hbase#hbase-procedure;2.4.9 in local-m2-cache
found org.apache.hbase#hbase-replication;2.4.9 in local-m2-cache
found org.glassfish.web#javax.servlet.jsp;2.3.2 in local-m2-cache
found org.glassfish#javax.el;3.0.1-b12 in local-m2-cache
found javax.servlet.jsp#javax.servlet.jsp-api;2.3.1 in local-m2-cache
found org.apache.commons#commons-math3;3.6.1 in local-m2-cache
found org.jamon#jamon-runtime;2.4.1 in local-m2-cache
found com.lmax#disruptor;3.4.2 in local-m2-cache
found org.apache.hadoop#hadoop-distcp;2.10.0 in local-m2-cache
found org.apache.hadoop#hadoop-annotations;2.10.0 in local-m2-cache
found org.apache.hadoop#hadoop-mapreduce-client-core;2.10.1 in central
found org.apache.hadoop#hadoop-yarn-client;2.10.1 in central
found commons-cli#commons-cli;1.2 in local-m2-cache
found log4j#log4j;1.2.17 in local-m2-cache
found org.apache.hadoop#hadoop-yarn-api;2.10.1 in central
found org.apache.hadoop#hadoop-yarn-common;2.10.1 in central
found org.apache.commons#commons-compress;1.19 in local-m2-cache
found com.sun.jersey#jersey-core;1.9 in local-m2-cache
found com.sun.jersey#jersey-client;1.9 in local-m2-cache
found com.google.inject.extensions#guice-servlet;3.0 in local-m2-cache
found com.google.inject#guice;3.0 in local-m2-cache
found javax.inject#javax.inject;1 in local-m2-cache
found aopalliance#aopalliance;1.0 in local-m2-cache
found org.sonatype.sisu.inject#cglib;2.2.1-v20090111 in central
found asm#asm;3.2 in central
found com.sun.jersey#jersey-server;1.9 in local-m2-cache
found com.sun.jersey#jersey-json;1.9 in local-m2-cache
found org.codehaus.jettison#jettison;1.3.8 in central
found com.sun.xml.bind#jaxb-impl;2.2.3-1 in local-m2-cache
found com.sun.jersey.contribs#jersey-guice;1.9 in local-m2-cache
found org.apache.avro#avro;1.8.2 in local-m2-cache
found com.thoughtworks.paranamer#paranamer;2.7 in local-m2-cache
found org.xerial.snappy#snappy-java;1.1.8.3 in local-m2-cache
found org.tukaani#xz;1.5 in local-m2-cache
found org.slf4j#slf4j-log4j12;1.7.30 in central
found io.netty#netty;3.10.6.Final in local-m2-cache
found org.lz4#lz4-java;1.8.0 in local-m2-cache
found org.roaringbitmap#shims;0.9.47 in local-m2-cache
found org.apache.hudi#hudi-hive-sync;0.14.0 in central
found org.apache.hudi#hudi-hadoop-mr;0.14.0 in central
found org.apache.hudi#hudi-sync-common;0.14.0 in central
found com.beust#jcommander;1.78 in central
found com.amazonaws#dynamodb-lock-client;1.2.0 in central
found software.amazon.awssdk#cloudwatch;2.18.40 in central
found software.amazon.awssdk#aws-query-protocol;2.18.40 in central
found software.amazon.awssdk#protocol-core;2.18.40 in central
found software.amazon.awssdk#sdk-core;2.18.40 in central
found software.amazon.awssdk#annotations;2.18.40 in central
found software.amazon.awssdk#http-client-spi;2.18.40 in central
found software.amazon.awssdk#utils;2.18.40 in central
found org.reactivestreams#reactive-streams;1.0.3 in local-m2-cache
found software.amazon.awssdk#metrics-spi;2.18.40 in central
found software.amazon.awssdk#endpoints-spi;2.18.40 in central
found software.amazon.awssdk#profiles;2.18.40 in central
found software.amazon.awssdk#aws-core;2.18.40 in central
found software.amazon.awssdk#regions;2.18.40 in central
found software.amazon.awssdk#json-utils;2.18.40 in central
found software.amazon.awssdk#third-party-jackson-core;2.18.40 in central
found software.amazon.awssdk#auth;2.18.40 in central
found software.amazon.eventstream#eventstream;1.0.1 in central
found software.amazon.awssdk#apache-client;2.18.40 in central
found software.amazon.awssdk#netty-nio-client;2.18.40 in central
found io.netty#netty-codec-http;4.1.77.Final in local-m2-cache
found io.netty#netty-common;4.1.77.Final in local-m2-cache
found io.netty#netty-buffer;4.1.77.Final in local-m2-cache
found io.netty#netty-transport;4.1.77.Final in local-m2-cache
found io.netty#netty-resolver;4.1.77.Final in local-m2-cache
found io.netty#netty-codec;4.1.77.Final in local-m2-cache
found io.netty#netty-handler;4.1.77.Final in local-m2-cache
found io.netty#netty-codec-http2;4.1.77.Final in local-m2-cache
found io.netty#netty-transport-classes-epoll;4.1.77.Final in local-m2-cache
found io.netty#netty-transport-native-unix-common;4.1.77.Final in local-m2-cache
found software.amazon.awssdk#dynamodb;2.18.40 in central
found software.amazon.awssdk#aws-json-protocol;2.18.40 in central
found software.amazon.awssdk#glue;2.18.40 in central
found software.amazon.awssdk#sqs;2.18.40 in central
found org.apache.httpcomponents#httpclient;4.5.13 in local-m2-cache
found org.apache.httpcomponents#httpcore;4.4.13 in local-m2-cache
found org.apache.hudi#hudi-aws-bundle;0.14.0 in central
found org.apache.parquet#parquet-avro;1.10.1 in local-m2-cache
found org.apache.parquet#parquet-column;1.10.1 in local-m2-cache
found org.apache.parquet#parquet-common;1.10.1 in local-m2-cache
found org.apache.parquet#parquet-format;2.4.0 in local-m2-cache
found org.apache.parquet#parquet-encoding;1.10.1 in local-m2-cache
found org.apache.parquet#parquet-hadoop;1.10.1 in local-m2-cache
found org.apache.parquet#parquet-jackson;1.10.1 in local-m2-cache
found org.codehaus.jackson#jackson-mapper-asl;1.9.13 in local-m2-cache
found org.codehaus.jackson#jackson-core-asl;1.9.13 in local-m2-cache
found commons-pool#commons-pool;1.6 in local-m2-cache
found it.unimi.dsi#fastutil;7.0.13 in local-m2-cache
:: resolution report :: resolve 1934ms :: artifacts dl 30ms
:: modules in use:
aopalliance#aopalliance;1.0 from local-m2-cache in [default]
asm#asm;3.2 from central in [default]
com.amazonaws#dynamodb-lock-client;1.2.0 from central in [default]
com.beust#jcommander;1.78 from central in [default]
com.fasterxml.jackson.core#jackson-annotations;2.10.0 from local-m2-cache in [default]
com.fasterxml.jackson.core#jackson-core;2.10.0 from local-m2-cache in [default]
com.fasterxml.jackson.core#jackson-databind;2.10.0 from local-m2-cache in [default]
com.fasterxml.jackson.datatype#jackson-datatype-jsr310;2.10.0 from local-m2-cache in [default]
com.github.ben-manes.caffeine#caffeine;2.9.1 from local-m2-cache in [default]
com.github.stephenc.jcip#jcip-annotations;1.0-1 from local-m2-cache in [default]
com.google.errorprone#error_prone_annotations;2.7.1 from local-m2-cache in [default]
com.google.inject#guice;3.0 from local-m2-cache in [default]
com.google.inject.extensions#guice-servlet;3.0 from local-m2-cache in [default]
com.google.protobuf#protobuf-java;3.21.7 from local-m2-cache in [default]
com.lmax#disruptor;3.4.2 from local-m2-cache in [default]
com.nimbusds#nimbus-jose-jwt;7.9 from local-m2-cache in [default]
com.sun.jersey#jersey-client;1.9 from local-m2-cache in [default]
com.sun.jersey#jersey-core;1.9 from local-m2-cache in [default]
com.sun.jersey#jersey-json;1.9 from local-m2-cache in [default]
com.sun.jersey#jersey-server;1.9 from local-m2-cache in [default]
com.sun.jersey.contribs#jersey-guice;1.9 from local-m2-cache in [default]
com.sun.xml.bind#jaxb-impl;2.2.3-1 from local-m2-cache in [default]
com.thoughtworks.paranamer#paranamer;2.7 from local-m2-cache in [default]
commons-cli#commons-cli;1.2 from local-m2-cache in [default]
commons-codec#commons-codec;1.13 from local-m2-cache in [default]
commons-io#commons-io;2.11.0 from local-m2-cache in [default]
commons-lang#commons-lang;2.6 from local-m2-cache in [default]
commons-logging#commons-logging;1.2 from local-m2-cache in [default]
commons-pool#commons-pool;1.6 from local-m2-cache in [default]
io.airlift#aircompressor;0.15 from local-m2-cache in [default]
io.dropwizard.metrics#metrics-core;4.1.1 from local-m2-cache in [default]
io.netty#netty;3.10.6.Final from local-m2-cache in [default]
io.netty#netty-buffer;4.1.77.Final from local-m2-cache in [default]
io.netty#netty-codec;4.1.77.Final from local-m2-cache in [default]
io.netty#netty-codec-http;4.1.77.Final from local-m2-cache in [default]
io.netty#netty-codec-http2;4.1.77.Final from local-m2-cache in [default]
io.netty#netty-common;4.1.77.Final from local-m2-cache in [default]
io.netty#netty-handler;4.1.77.Final from local-m2-cache in [default]
io.netty#netty-resolver;4.1.77.Final from local-m2-cache in [default]
io.netty#netty-transport;4.1.77.Final from local-m2-cache in [default]
io.netty#netty-transport-classes-epoll;4.1.77.Final from local-m2-cache in [default]
io.netty#netty-transport-native-epoll;4.1.45.Final from local-m2-cache in [default]
io.netty#netty-transport-native-unix-common;4.1.77.Final from local-m2-cache in [default]
it.unimi.dsi#fastutil;7.0.13 from local-m2-cache in [default]
javax.annotation#javax.annotation-api;1.2 from local-m2-cache in [default]
javax.inject#javax.inject;1 from local-m2-cache in [default]
javax.servlet.jsp#javax.servlet.jsp-api;2.3.1 from local-m2-cache in [default]
javax.xml.bind#jaxb-api;2.2.11 from local-m2-cache in [default]
log4j#log4j;1.2.17 from local-m2-cache in [default]
org.apache.avro#avro;1.8.2 from local-m2-cache in [default]
org.apache.commons#commons-compress;1.19 from local-m2-cache in [default]
org.apache.commons#commons-crypto;1.0.0 from local-m2-cache in [default]
org.apache.commons#commons-lang3;3.9 from local-m2-cache in [default]
org.apache.commons#commons-math3;3.6.1 from local-m2-cache in [default]
org.apache.curator#curator-client;2.7.1 from local-m2-cache in [default]
org.apache.curator#curator-framework;2.7.1 from local-m2-cache in [default]
org.apache.directory.api#api-asn1-api;1.0.0-M20 from local-m2-cache in [default]
org.apache.directory.api#api-util;1.0.0-M20 from local-m2-cache in [default]
org.apache.directory.server#apacheds-i18n;2.0.0-M15 from local-m2-cache in [default]
org.apache.directory.server#apacheds-kerberos-codec;2.0.0-M15 from local-m2-cache in [default]
org.apache.hadoop#hadoop-annotations;2.10.0 from local-m2-cache in [default]
org.apache.hadoop#hadoop-auth;2.10.1 from central in [default]
org.apache.hadoop#hadoop-distcp;2.10.0 from local-m2-cache in [default]
org.apache.hadoop#hadoop-mapreduce-client-core;2.10.1 from central in [default]
org.apache.hadoop#hadoop-yarn-api;2.10.1 from central in [default]
org.apache.hadoop#hadoop-yarn-client;2.10.1 from central in [default]
org.apache.hadoop#hadoop-yarn-common;2.10.1 from central in [default]
org.apache.hbase#hbase-client;2.4.9 from local-m2-cache in [default]
org.apache.hbase#hbase-procedure;2.4.9 from local-m2-cache in [default]
org.apache.hbase#hbase-protocol;2.4.9 from local-m2-cache in [default]
org.apache.hbase#hbase-protocol-shaded;2.4.9 from local-m2-cache in [default]
org.apache.hbase#hbase-replication;2.4.9 from local-m2-cache in [default]
org.apache.hbase#hbase-server;2.4.9 from local-m2-cache in [default]
org.apache.hbase.thirdparty#hbase-shaded-miscellaneous;3.5.1 from local-m2-cache in [default]
org.apache.hbase.thirdparty#hbase-shaded-netty;3.5.1 from local-m2-cache in [default]
org.apache.hbase.thirdparty#hbase-shaded-protobuf;3.5.1 from local-m2-cache in [default]
org.apache.hive#hive-storage-api;2.6.0 from local-m2-cache in [default]
org.apache.htrace#htrace-core4;4.2.0-incubating from local-m2-cache in [default]
org.apache.httpcomponents#fluent-hc;4.4.1 from local-m2-cache in [default]
org.apache.httpcomponents#httpclient;4.5.13 from local-m2-cache in [default]
org.apache.httpcomponents#httpcore;4.4.13 from local-m2-cache in [default]
org.apache.hudi#hudi-aws;0.14.0 from central in [default]
org.apache.hudi#hudi-aws-bundle;0.14.0 from central in [default]
org.apache.hudi#hudi-common;0.14.0 from central in [default]
org.apache.hudi#hudi-hadoop-mr;0.14.0 from central in [default]
org.apache.hudi#hudi-hive-sync;0.14.0 from central in [default]
org.apache.hudi#hudi-sync-common;0.14.0 from central in [default]
org.apache.orc#orc-core;1.6.0 from local-m2-cache in [default]
org.apache.orc#orc-shims;1.6.0 from local-m2-cache in [default]
org.apache.parquet#parquet-avro;1.10.1 from local-m2-cache in [default]
org.apache.parquet#parquet-column;1.10.1 from local-m2-cache in [default]
org.apache.parquet#parquet-common;1.10.1 from local-m2-cache in [default]
org.apache.parquet#parquet-encoding;1.10.1 from local-m2-cache in [default]
org.apache.parquet#parquet-format;2.4.0 from local-m2-cache in [default]
org.apache.parquet#parquet-hadoop;1.10.1 from local-m2-cache in [default]
org.apache.parquet#parquet-jackson;1.10.1 from local-m2-cache in [default]
org.apache.yetus#audience-annotations;0.5.0 from local-m2-cache in [default]
org.apache.zookeeper#zookeeper;3.5.7 from local-m2-cache in [default]
org.apache.zookeeper#zookeeper-jute;3.5.7 from local-m2-cache in [default]
org.checkerframework#checker-qual;3.10.0 from local-m2-cache in [default]
org.codehaus.jackson#jackson-core-asl;1.9.13 from local-m2-cache in [default]
org.codehaus.jackson#jackson-mapper-asl;1.9.13 from local-m2-cache in [default]
org.codehaus.jettison#jettison;1.3.8 from central in [default]
org.glassfish#javax.el;3.0.1-b12 from local-m2-cache in [default]
org.glassfish.web#javax.servlet.jsp;2.3.2 from local-m2-cache in [default]
org.jamon#jamon-runtime;2.4.1 from local-m2-cache in [default]
org.jetbrains#annotations;17.0.0 from local-m2-cache in [default]
org.jruby.jcodings#jcodings;1.0.55 from local-m2-cache in [default]
org.jruby.joni#joni;2.1.31 from local-m2-cache in [default]
org.lz4#lz4-java;1.8.0 from local-m2-cache in [default]
org.openjdk.jol#jol-core;0.16 from local-m2-cache in [default]
org.reactivestreams#reactive-streams;1.0.3 from local-m2-cache in [default]
org.roaringbitmap#RoaringBitmap;0.9.47 from local-m2-cache in [default]
org.roaringbitmap#shims;0.9.47 from local-m2-cache in [default]
org.rocksdb#rocksdbjni;7.5.3 from local-m2-cache in [default]
org.slf4j#slf4j-api;1.7.36 from local-m2-cache in [default]
org.slf4j#slf4j-log4j12;1.7.30 from central in [default]
org.sonatype.sisu.inject#cglib;2.2.1-v20090111 from central in [default]
org.tukaani#xz;1.5 from local-m2-cache in [default]
org.xerial.snappy#snappy-java;1.1.8.3 from local-m2-cache in [default]
software.amazon.awssdk#annotations;2.18.40 from central in [default]
software.amazon.awssdk#apache-client;2.18.40 from central in [default]
software.amazon.awssdk#auth;2.18.40 from central in [default]
software.amazon.awssdk#aws-core;2.18.40 from central in [default]
software.amazon.awssdk#aws-json-protocol;2.18.40 from central in [default]
software.amazon.awssdk#aws-query-protocol;2.18.40 from central in [default]
software.amazon.awssdk#cloudwatch;2.18.40 from central in [default]
software.amazon.awssdk#dynamodb;2.18.40 from central in [default]
software.amazon.awssdk#endpoints-spi;2.18.40 from central in [default]
software.amazon.awssdk#glue;2.18.40 from central in [default]
software.amazon.awssdk#http-client-spi;2.18.40 from central in [default]
software.amazon.awssdk#json-utils;2.18.40 from central in [default]
software.amazon.awssdk#metrics-spi;2.18.40 from central in [default]
software.amazon.awssdk#netty-nio-client;2.18.40 from central in [default]
software.amazon.awssdk#profiles;2.18.40 from central in [default]
software.amazon.awssdk#protocol-core;2.18.40 from central in [default]
software.amazon.awssdk#regions;2.18.40 from central in [default]
software.amazon.awssdk#sdk-core;2.18.40 from central in [default]
software.amazon.awssdk#sqs;2.18.40 from central in [default]
software.amazon.awssdk#third-party-jackson-core;2.18.40 from central in [default]
software.amazon.awssdk#utils;2.18.40 from central in [default]
software.amazon.eventstream#eventstream;1.0.1 from central in [default]
:: evicted modules:
com.google.errorprone#error_prone_annotations;2.5.1 by [com.google.errorprone#error_prone_annotations;2.7.1] in [default]
org.apache.httpcomponents#httpclient;4.4.1 by [org.apache.httpcomponents#httpclient;4.5.13] in [default]
io.netty#netty-handler;4.1.45.Final by [io.netty#netty-handler;4.1.77.Final] in [default]
io.netty#netty-common;4.1.45.Final by [io.netty#netty-common;4.1.77.Final] in [default]
io.netty#netty-buffer;4.1.45.Final by [io.netty#netty-buffer;4.1.77.Final] in [default]
io.netty#netty-transport;4.1.45.Final by [io.netty#netty-transport;4.1.77.Final] in [default]
io.netty#netty-resolver;4.1.45.Final by [io.netty#netty-resolver;4.1.77.Final] in [default]
io.netty#netty-codec;4.1.45.Final by [io.netty#netty-codec;4.1.77.Final] in [default]
io.netty#netty-transport-native-unix-common;4.1.45.Final by [io.netty#netty-transport-native-unix-common;4.1.77.Final] in [default]
software.amazon.awssdk#dynamodb;2.20.8 by [software.amazon.awssdk#dynamodb;2.18.40] in [default]
software.amazon.awssdk#sdk-core;2.20.8 by [software.amazon.awssdk#sdk-core;2.18.40] in [default]
software.amazon.awssdk#annotations;2.20.8 by [software.amazon.awssdk#annotations;2.18.40] in [default]
software.amazon.awssdk#auth;2.20.8 by [software.amazon.awssdk#auth;2.18.40] in [default]
software.amazon.awssdk#regions;2.20.8 by [software.amazon.awssdk#regions;2.18.40] in [default]
software.amazon.awssdk#http-client-spi;2.20.8 by [software.amazon.awssdk#http-client-spi;2.18.40] in [default]
software.amazon.awssdk#aws-core;2.20.8 by [software.amazon.awssdk#aws-core;2.18.40] in [default]
org.apache.httpcomponents#httpcore;4.4.1 by [org.apache.httpcomponents#httpcore;4.4.13] in [default]
commons-codec#commons-codec;1.11 by [commons-codec#commons-codec;1.13] in [default]
commons-codec#commons-codec;1.10 by [commons-codec#commons-codec;1.13] in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 162 | 0 | 0 | 19 || 142 | 0 |
---------------------------------------------------------------------
:: problems summary ::
:::: WARNINGS
module not found: org.apache.hudi#hudi-spark3.4.1-bundle_2.12;0.14.0
==== local-m2-cache: tried
file:/Users/soumilshah/.m2/repository/org/apache/hudi/hudi-spark3.4.1-bundle_2.12/0.14.0/hudi-spark3.4.1-bundle_2.12-0.14.0.pom
-- artifact org.apache.hudi#hudi-spark3.4.1-bundle_2.12;0.14.0!hudi-spark3.4.1-bundle_2.12.jar:
file:/Users/soumilshah/.m2/repository/org/apache/hudi/hudi-spark3.4.1-bundle_2.12/0.14.0/hudi-spark3.4.1-bundle_2.12-0.14.0.jar
==== local-ivy-cache: tried
/Users/soumilshah/.ivy2/local/org.apache.hudi/hudi-spark3.4.1-bundle_2.12/0.14.0/ivys/ivy.xml
-- artifact org.apache.hudi#hudi-spark3.4.1-bundle_2.12;0.14.0!hudi-spark3.4.1-bundle_2.12.jar:
/Users/soumilshah/.ivy2/local/org.apache.hudi/hudi-spark3.4.1-bundle_2.12/0.14.0/jars/hudi-spark3.4.1-bundle_2.12.jar
==== central: tried
https://repo1.maven.org/maven2/org/apache/hudi/hudi-spark3.4.1-bundle_2.12/0.14.0/hudi-spark3.4.1-bundle_2.12-0.14.0.pom
-- artifact org.apache.hudi#hudi-spark3.4.1-bundle_2.12;0.14.0!hudi-spark3.4.1-bundle_2.12.jar:
https://repo1.maven.org/maven2/org/apache/hudi/hudi-spark3.4.1-bundle_2.12/0.14.0/hudi-spark3.4.1-bundle_2.12-0.14.0.jar
==== spark-packages: tried
https://repos.spark-packages.org/org/apache/hudi/hudi-spark3.4.1-bundle_2.12/0.14.0/hudi-spark3.4.1-bundle_2.12-0.14.0.pom
-- artifact org.apache.hudi#hudi-spark3.4.1-bundle_2.12;0.14.0!hudi-spark3.4.1-bundle_2.12.jar:
https://repos.spark-packages.org/org/apache/hudi/hudi-spark3.4.1-bundle_2.12/0.14.0/hudi-spark3.4.1-bundle_2.12-0.14.0.jar
[NOT FOUND ] commons-codec#commons-codec;1.13!commons-codec.jar (0ms)
==== local-m2-cache: tried
file:/Users/soumilshah/.m2/repository/commons-codec/commons-codec/1.13/commons-codec-1.13.jar
::::::::::::::::::::::::::::::::::::::::::::::
:: UNRESOLVED DEPENDENCIES ::
::::::::::::::::::::::::::::::::::::::::::::::
:: org.apache.hudi#hudi-spark3.4.1-bundle_2.12;0.14.0: not found
::::::::::::::::::::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::::::::::::::::::
:: FAILED DOWNLOADS ::
:: ^ see resolution messages for details ^ ::
::::::::::::::::::::::::::::::::::::::::::::::
:: commons-codec#commons-codec;1.13!commons-codec.jar
::::::::::::::::::::::::::::::::::::::::::::::
:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: org.apache.hudi#hudi-spark3.4.1-bundle_2.12;0.14.0: not found, download failed: commons-codec#commons-codec;1.13!commons-codec.jar]
at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1528)
at org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:185)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:332)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1111)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Traceback (most recent call last):
File "/Users/soumilshah/IdeaProjects/SparkProject/deltastreamerBroadcastJoins/conflictdetection/w1.py", line 36, in <module>
.getOrCreate()
^^^^^^^^^^^^^
File "/opt/anaconda3/lib/python3.11/site-packages/pyspark/sql/session.py", line 477, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/lib/python3.11/site-packages/pyspark/context.py", line 512, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "/opt/anaconda3/lib/python3.11/site-packages/pyspark/context.py", line 198, in __init__
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File "/opt/anaconda3/lib/python3.11/site-packages/pyspark/context.py", line 432, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/lib/python3.11/site-packages/pyspark/java_gateway.py", line 106, in launch_gateway
raise RuntimeError("Java gateway process exited before sending its port number")
RuntimeError: Java gateway process exited before sending its port number
(base) soumilshah@Soumils-MBP conflictdetection %
am I missing any other packages ?
Added following packages
HUDI_VERSION = '0.14.0'
SPARK_VERSION = '3.4'
os.environ["JAVA_HOME"] = "/opt/homebrew/opt/openjdk@11"
SUBMIT_ARGS = f"--packages org.apache.hudi:hudi-spark{SPARK_VERSION}-bundle_2.12:{HUDI_VERSION},com.amazonaws:dynamodb-lock-client:1.2.0,com.amazonaws:aws-java-sdk-dynamodb:1.12.735,com.amazonaws:aws-java-sdk-core:1.12.735,org.apache.hudi:hudi-aws-bundle:{HUDI_VERSION},org.apache.hudi:hudi-aws:{HUDI_VERSION} pyspark-shell"
os.environ["PYSPARK_SUBMIT_ARGS"] = SUBMIT_ARGS
os.environ['PYSPARK_PYTHON'] = sys.executable
spark = SparkSession.builder \
.config('spark.serializer', 'org.apache.spark.serializer.KryoSerializer') \
.config('spark.sql.extensions', 'org.apache.spark.sql.hudi.HoodieSparkSessionExtension') \
.config('className', 'org.apache.hudi') \
.config('spark.sql.hive.convertMetastoreParquet', 'false') \
.getOrCreate()
g.apache.hudi#hudi-aws-bundle added as a dependency
org.apache.hudi#hudi-aws added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-aa8d9c29-7056-4201-b20a-c5f73fac7ea9;1.0
confs: [default]
found org.apache.hudi#hudi-spark3.4-bundle_2.12;0.14.0 in spark-list
found com.amazonaws#dynamodb-lock-client;1.2.0 in central
found software.amazon.awssdk#dynamodb;2.20.8 in central
found software.amazon.awssdk#aws-json-protocol;2.20.8 in central
found software.amazon.awssdk#aws-core;2.20.8 in central
found software.amazon.awssdk#annotations;2.20.8 in central
found software.amazon.awssdk#regions;2.20.8 in central
found software.amazon.awssdk#utils;2.20.8 in central
found org.reactivestreams#reactive-streams;1.0.2 in central
found org.slf4j#slf4j-api;1.7.30 in local-m2-cache
found software.amazon.awssdk#sdk-core;2.20.8 in central
found software.amazon.awssdk#http-client-spi;2.20.8 in central
found software.amazon.awssdk#metrics-spi;2.20.8 in central
found software.amazon.awssdk#endpoints-spi;2.20.8 in central
found software.amazon.awssdk#profiles;2.20.8 in central
found software.amazon.awssdk#json-utils;2.20.8 in central
found software.amazon.awssdk#third-party-jackson-core;2.20.8 in central
found software.amazon.awssdk#auth;2.20.8 in central
found software.amazon.eventstream#eventstream;1.0.1 in central
found software.amazon.awssdk#protocol-core;2.20.8 in central
found software.amazon.awssdk#apache-client;2.20.8 in central
found org.apache.httpcomponents#httpclient;4.5.13 in local-m2-cache
found org.apache.httpcomponents#httpcore;4.4.13 in local-m2-cache
found commons-logging#commons-logging;1.2 in local-m2-cache
found software.amazon.awssdk#netty-nio-client;2.20.8 in central
found io.netty#netty-codec-http2;4.1.86.Final in central
found io.netty#netty-common;4.1.86.Final in central
found io.netty#netty-buffer;4.1.86.Final in central
found io.netty#netty-transport;4.1.86.Final in central
found io.netty#netty-resolver;4.1.86.Final in central
found io.netty#netty-codec;4.1.86.Final in central
found io.netty#netty-transport-classes-epoll;4.1.86.Final in central
found io.netty#netty-transport-native-unix-common;4.1.86.Final in central
found com.amazonaws#aws-java-sdk-dynamodb;1.12.735 in central
found com.amazonaws#aws-java-sdk-s3;1.12.735 in central
found com.amazonaws#aws-java-sdk-kms;1.12.735 in central
found com.amazonaws#aws-java-sdk-core;1.12.735 in central
found commons-codec#commons-codec;1.15 in local-m2-cache
found com.fasterxml.jackson.core#jackson-databind;2.12.7.2 in central
found com.fasterxml.jackson.core#jackson-annotations;2.12.7 in local-m2-cache
found com.fasterxml.jackson.core#jackson-core;2.12.7 in local-m2-cache
found com.fasterxml.jackson.dataformat#jackson-dataformat-cbor;2.12.6 in central
found joda-time#joda-time;2.12.7 in central
found com.amazonaws#jmespath-java;1.12.735 in central
found org.apache.hudi#hudi-aws-bundle;0.14.0 in central
found org.apache.hudi#hudi-common;0.14.0 in central
found org.openjdk.jol#jol-core;0.16 in local-m2-cache
found com.fasterxml.jackson.datatype#jackson-datatype-jsr310;2.10.0 in local-m2-cache
found com.github.ben-manes.caffeine#caffeine;2.9.1 in local-m2-cache
found org.checkerframework#checker-qual;3.10.0 in local-m2-cache
found com.google.errorprone#error_prone_annotations;2.5.1 in local-m2-cache
found org.apache.orc#orc-core;1.6.0 in local-m2-cache
found org.apache.orc#orc-shims;1.6.0 in local-m2-cache
found org.slf4j#slf4j-api;1.7.36 in local-m2-cache
found com.google.protobuf#protobuf-java;3.21.7 in local-m2-cache
found commons-lang#commons-lang;2.6 in local-m2-cache
found io.airlift#aircompressor;0.15 in local-m2-cache
found javax.xml.bind#jaxb-api;2.2.11 in local-m2-cache
found org.apache.hive#hive-storage-api;2.6.0 in local-m2-cache
found org.jetbrains#annotations;17.0.0 in local-m2-cache
found org.roaringbitmap#RoaringBitmap;0.9.47 in local-m2-cache
found org.apache.httpcomponents#fluent-hc;4.4.1 in local-m2-cache
found org.rocksdb#rocksdbjni;7.5.3 in local-m2-cache
found org.apache.hbase#hbase-client;2.4.9 in local-m2-cache
found org.apache.hbase.thirdparty#hbase-shaded-protobuf;3.5.1 in local-m2-cache
found org.apache.hbase#hbase-protocol-shaded;2.4.9 in local-m2-cache
found org.apache.yetus#audience-annotations;0.5.0 in local-m2-cache
found org.apache.hbase#hbase-protocol;2.4.9 in local-m2-cache
found javax.annotation#javax.annotation-api;1.2 in local-m2-cache
found commons-io#commons-io;2.11.0 in local-m2-cache
found org.apache.commons#commons-lang3;3.9 in local-m2-cache
found org.apache.hbase.thirdparty#hbase-shaded-miscellaneous;3.5.1 in local-m2-cache
found com.google.errorprone#error_prone_annotations;2.7.1 in local-m2-cache
found org.apache.hbase.thirdparty#hbase-shaded-netty;3.5.1 in local-m2-cache
found org.apache.zookeeper#zookeeper;3.5.7 in local-m2-cache
found org.apache.zookeeper#zookeeper-jute;3.5.7 in local-m2-cache
found io.netty#netty-handler;4.1.45.Final in local-m2-cache
found io.netty#netty-transport-native-epoll;4.1.45.Final in local-m2-cache
found org.apache.htrace#htrace-core4;4.2.0-incubating in local-m2-cache
found org.jruby.jcodings#jcodings;1.0.55 in local-m2-cache
found org.jruby.joni#joni;2.1.31 in local-m2-cache
found io.dropwizard.metrics#metrics-core;4.1.1 in local-m2-cache
found org.apache.commons#commons-crypto;1.0.0 in local-m2-cache
found org.apache.hbase#hbase-server;2.4.9 in local-m2-cache
found org.apache.hbase#hbase-procedure;2.4.9 in local-m2-cache
found org.apache.hbase#hbase-replication;2.4.9 in local-m2-cache
found org.glassfish.web#javax.servlet.jsp;2.3.2 in local-m2-cache
found org.glassfish#javax.el;3.0.1-b12 in local-m2-cache
found javax.servlet.jsp#javax.servlet.jsp-api;2.3.1 in local-m2-cache
found org.apache.commons#commons-math3;3.6.1 in local-m2-cache
found org.jamon#jamon-runtime;2.4.1 in local-m2-cache
found com.lmax#disruptor;3.4.2 in local-m2-cache
found org.lz4#lz4-java;1.8.0 in local-m2-cache
found org.roaringbitmap#shims;0.9.47 in local-m2-cache
found org.apache.hudi#hudi-hive-sync;0.14.0 in central
found org.apache.hudi#hudi-hadoop-mr;0.14.0 in central
found org.apache.hudi#hudi-sync-common;0.14.0 in central
found com.beust#jcommander;1.78 in central
found org.apache.hudi#hudi-aws;0.14.0 in central
found software.amazon.awssdk#cloudwatch;2.18.40 in central
found software.amazon.awssdk#aws-query-protocol;2.18.40 in central
found software.amazon.awssdk#glue;2.18.40 in central
found software.amazon.awssdk#sqs;2.18.40 in central
found org.apache.parquet#parquet-avro;1.10.1 in local-m2-cache
found org.apache.parquet#parquet-column;1.10.1 in local-m2-cache
found org.apache.parquet#parquet-common;1.10.1 in local-m2-cache
found org.apache.parquet#parquet-format;2.4.0 in local-m2-cache
found org.apache.parquet#parquet-encoding;1.10.1 in local-m2-cache
found org.apache.parquet#parquet-hadoop;1.10.1 in local-m2-cache
found org.apache.parquet#parquet-jackson;1.10.1 in local-m2-cache
found org.codehaus.jackson#jackson-mapper-asl;1.9.13 in local-m2-cache
found org.codehaus.jackson#jackson-core-asl;1.9.13 in local-m2-cache
found org.xerial.snappy#snappy-java;1.1.8.3 in local-m2-cache
found commons-pool#commons-pool;1.6 in local-m2-cache
found org.apache.avro#avro;1.8.2 in local-m2-cache
found com.thoughtworks.paranamer#paranamer;2.7 in local-m2-cache
found org.apache.commons#commons-compress;1.8.1 in local-m2-cache
found org.tukaani#xz;1.5 in local-m2-cache
found it.unimi.dsi#fastutil;7.0.13 in local-m2-cache
:: resolution report :: resolve 734ms :: artifacts dl 22ms
:: modules in use:
com.amazonaws#aws-java-sdk-core;1.12.735 from central in [default]
com.amazonaws#aws-java-sdk-dynamodb;1.12.735 from central in [default]
com.amazonaws#aws-java-sdk-kms;1.12.735 from central in [default]
com.amazonaws#aws-java-sdk-s3;1.12.735 from central in [default]
com.amazonaws#dynamodb-lock-client;1.2.0 from central in [default]
com.amazonaws#jmespath-java;1.12.735 from central in [default]
com.beust#jcommander;1.78 from central in [default]
com.fasterxml.jackson.core#jackson-annotations;2.12.7 from local-m2-cache in [default]
com.fasterxml.jackson.core#jackson-core;2.12.7 from local-m2-cache in [default]
com.fasterxml.jackson.core#jackson-databind;2.12.7.2 from central in [default]
com.fasterxml.jackson.dataformat#jackson-dataformat-cbor;2.12.6 from central in [default]
com.fasterxml.jackson.datatype#jackson-datatype-jsr310;2.10.0 from local-m2-cache in [default]
com.github.ben-manes.caffeine#caffeine;2.9.1 from local-m2-cache in [default]
com.google.errorprone#error_prone_annotations;2.7.1 from local-m2-cache in [default]
com.google.protobuf#protobuf-java;3.21.7 from local-m2-cache in [default]
com.lmax#disruptor;3.4.2 from local-m2-cache in [default]
com.thoughtworks.paranamer#paranamer;2.7 from local-m2-cache in [default]
commons-codec#commons-codec;1.15 from local-m2-cache in [default]
commons-io#commons-io;2.11.0 from local-m2-cache in [default]
commons-lang#commons-lang;2.6 from local-m2-cache in [default]
commons-logging#commons-logging;1.2 from local-m2-cache in [default]
commons-pool#commons-pool;1.6 from local-m2-cache in [default]
io.airlift#aircompressor;0.15 from local-m2-cache in [default]
io.dropwizard.metrics#metrics-core;4.1.1 from local-m2-cache in [default]
io.netty#netty-buffer;4.1.86.Final from central in [default]
io.netty#netty-codec;4.1.86.Final from central in [default]
io.netty#netty-codec-http2;4.1.86.Final from central in [default]
io.netty#netty-common;4.1.86.Final from central in [default]
io.netty#netty-handler;4.1.45.Final from local-m2-cache in [default]
io.netty#netty-resolver;4.1.86.Final from central in [default]
io.netty#netty-transport;4.1.86.Final from central in [default]
io.netty#netty-transport-classes-epoll;4.1.86.Final from central in [default]
io.netty#netty-transport-native-epoll;4.1.45.Final from local-m2-cache in [default]
io.netty#netty-transport-native-unix-common;4.1.86.Final from central in [default]
it.unimi.dsi#fastutil;7.0.13 from local-m2-cache in [default]
javax.annotation#javax.annotation-api;1.2 from local-m2-cache in [default]
javax.servlet.jsp#javax.servlet.jsp-api;2.3.1 from local-m2-cache in [default]
javax.xml.bind#jaxb-api;2.2.11 from local-m2-cache in [default]
joda-time#joda-time;2.12.7 from central in [default]
org.apache.avro#avro;1.8.2 from local-m2-cache in [default]
org.apache.commons#commons-compress;1.8.1 from local-m2-cache in [default]
org.apache.commons#commons-crypto;1.0.0 from local-m2-cache in [default]
org.apache.commons#commons-lang3;3.9 from local-m2-cache in [default]
org.apache.commons#commons-math3;3.6.1 from local-m2-cache in [default]
org.apache.hbase#hbase-client;2.4.9 from local-m2-cache in [default]
org.apache.hbase#hbase-procedure;2.4.9 from local-m2-cache in [default]
org.apache.hbase#hbase-protocol;2.4.9 from local-m2-cache in [default]
org.apache.hbase#hbase-protocol-shaded;2.4.9 from local-m2-cache in [default]
org.apache.hbase#hbase-replication;2.4.9 from local-m2-cache in [default]
org.apache.hbase#hbase-server;2.4.9 from local-m2-cache in [default]
org.apache.hbase.thirdparty#hbase-shaded-miscellaneous;3.5.1 from local-m2-cache in [default]
org.apache.hbase.thirdparty#hbase-shaded-netty;3.5.1 from local-m2-cache in [default]
org.apache.hbase.thirdparty#hbase-shaded-protobuf;3.5.1 from local-m2-cache in [default]
org.apache.hive#hive-storage-api;2.6.0 from local-m2-cache in [default]
org.apache.htrace#htrace-core4;4.2.0-incubating from local-m2-cache in [default]
org.apache.httpcomponents#fluent-hc;4.4.1 from local-m2-cache in [default]
org.apache.httpcomponents#httpclient;4.5.13 from local-m2-cache in [default]
org.apache.httpcomponents#httpcore;4.4.13 from local-m2-cache in [default]
org.apache.hudi#hudi-aws;0.14.0 from central in [default]
org.apache.hudi#hudi-aws-bundle;0.14.0 from central in [default]
org.apache.hudi#hudi-common;0.14.0 from central in [default]
org.apache.hudi#hudi-hadoop-mr;0.14.0 from central in [default]
org.apache.hudi#hudi-hive-sync;0.14.0 from central in [default]
org.apache.hudi#hudi-spark3.4-bundle_2.12;0.14.0 from spark-list in [default]
org.apache.hudi#hudi-sync-common;0.14.0 from central in [default]
org.apache.orc#orc-core;1.6.0 from local-m2-cache in [default]
org.apache.orc#orc-shims;1.6.0 from local-m2-cache in [default]
org.apache.parquet#parquet-avro;1.10.1 from local-m2-cache in [default]
org.apache.parquet#parquet-column;1.10.1 from local-m2-cache in [default]
org.apache.parquet#parquet-common;1.10.1 from local-m2-cache in [default]
org.apache.parquet#parquet-encoding;1.10.1 from local-m2-cache in [default]
org.apache.parquet#parquet-format;2.4.0 from local-m2-cache in [default]
org.apache.parquet#parquet-hadoop;1.10.1 from local-m2-cache in [default]
org.apache.parquet#parquet-jackson;1.10.1 from local-m2-cache in [default]
org.apache.yetus#audience-annotations;0.5.0 from local-m2-cache in [default]
org.apache.zookeeper#zookeeper;3.5.7 from local-m2-cache in [default]
org.apache.zookeeper#zookeeper-jute;3.5.7 from local-m2-cache in [default]
org.checkerframework#checker-qual;3.10.0 from local-m2-cache in [default]
org.codehaus.jackson#jackson-core-asl;1.9.13 from local-m2-cache in [default]
org.codehaus.jackson#jackson-mapper-asl;1.9.13 from local-m2-cache in [default]
org.glassfish#javax.el;3.0.1-b12 from local-m2-cache in [default]
org.glassfish.web#javax.servlet.jsp;2.3.2 from local-m2-cache in [default]
org.jamon#jamon-runtime;2.4.1 from local-m2-cache in [default]
org.jetbrains#annotations;17.0.0 from local-m2-cache in [default]
org.jruby.jcodings#jcodings;1.0.55 from local-m2-cache in [default]
org.jruby.joni#joni;2.1.31 from local-m2-cache in [default]
org.lz4#lz4-java;1.8.0 from local-m2-cache in [default]
org.openjdk.jol#jol-core;0.16 from local-m2-cache in [default]
org.reactivestreams#reactive-streams;1.0.2 from central in [default]
org.roaringbitmap#RoaringBitmap;0.9.47 from local-m2-cache in [default]
org.roaringbitmap#shims;0.9.47 from local-m2-cache in [default]
org.rocksdb#rocksdbjni;7.5.3 from local-m2-cache in [default]
org.slf4j#slf4j-api;1.7.36 from local-m2-cache in [default]
org.tukaani#xz;1.5 from local-m2-cache in [default]
org.xerial.snappy#snappy-java;1.1.8.3 from local-m2-cache in [default]
software.amazon.awssdk#annotations;2.20.8 from central in [default]
software.amazon.awssdk#apache-client;2.20.8 from central in [default]
software.amazon.awssdk#auth;2.20.8 from central in [default]
software.amazon.awssdk#aws-core;2.20.8 from central in [default]
software.amazon.awssdk#aws-json-protocol;2.20.8 from central in [default]
software.amazon.awssdk#aws-query-protocol;2.18.40 from central in [default]
software.amazon.awssdk#cloudwatch;2.18.40 from central in [default]
software.amazon.awssdk#dynamodb;2.20.8 from central in [default]
software.amazon.awssdk#endpoints-spi;2.20.8 from central in [default]
software.amazon.awssdk#glue;2.18.40 from central in [default]
software.amazon.awssdk#http-client-spi;2.20.8 from central in [default]
software.amazon.awssdk#json-utils;2.20.8 from central in [default]
software.amazon.awssdk#metrics-spi;2.20.8 from central in [default]
software.amazon.awssdk#netty-nio-client;2.20.8 from central in [default]
software.amazon.awssdk#profiles;2.20.8 from central in [default]
software.amazon.awssdk#protocol-core;2.20.8 from central in [default]
software.amazon.awssdk#regions;2.20.8 from central in [default]
software.amazon.awssdk#sdk-core;2.20.8 from central in [default]
software.amazon.awssdk#sqs;2.18.40 from central in [default]
software.amazon.awssdk#third-party-jackson-core;2.20.8 from central in [default]
software.amazon.awssdk#utils;2.20.8 from central in [default]
software.amazon.eventstream#eventstream;1.0.1 from central in [default]
:: evicted modules:
org.slf4j#slf4j-api;1.7.30 by [org.slf4j#slf4j-api;1.7.36] in [default]
commons-logging#commons-logging;1.1.3 by [commons-logging#commons-logging;1.2] in [default]
com.fasterxml.jackson.core#jackson-databind;2.12.6 by [com.fasterxml.jackson.core#jackson-databind;2.12.7.2] in [default]
com.fasterxml.jackson.core#jackson-core;2.12.6 by [com.fasterxml.jackson.core#jackson-core;2.12.7] in [default]
com.fasterxml.jackson.core#jackson-annotations;2.10.0 by [com.fasterxml.jackson.core#jackson-annotations;2.12.7] in [default]
com.fasterxml.jackson.core#jackson-databind;2.10.0 by [com.fasterxml.jackson.core#jackson-databind;2.12.7.2] in [default]
com.fasterxml.jackson.core#jackson-core;2.10.0 by [com.fasterxml.jackson.core#jackson-core;2.12.7] in [default]
com.google.errorprone#error_prone_annotations;2.5.1 by [com.google.errorprone#error_prone_annotations;2.7.1] in [default]
org.apache.httpcomponents#httpclient;4.4.1 by [org.apache.httpcomponents#httpclient;4.5.13] in [default]
commons-codec#commons-codec;1.13 by [commons-codec#commons-codec;1.10] in [default]
io.netty#netty-common;4.1.45.Final by [io.netty#netty-common;4.1.86.Final] in [default]
io.netty#netty-buffer;4.1.45.Final by [io.netty#netty-buffer;4.1.86.Final] in [default]
io.netty#netty-transport;4.1.45.Final by [io.netty#netty-transport;4.1.86.Final] in [default]
io.netty#netty-codec;4.1.45.Final by [io.netty#netty-codec;4.1.86.Final] in [default]
io.netty#netty-transport-native-unix-common;4.1.45.Final by [io.netty#netty-transport-native-unix-common;4.1.86.Final] in [default]
software.amazon.awssdk#protocol-core;2.18.40 by [software.amazon.awssdk#protocol-core;2.20.8] in [default]
software.amazon.awssdk#aws-core;2.18.40 by [software.amazon.awssdk#aws-core;2.20.8] in [default]
software.amazon.awssdk#sdk-core;2.18.40 by [software.amazon.awssdk#sdk-core;2.20.8] in [default]
software.amazon.awssdk#annotations;2.18.40 by [software.amazon.awssdk#annotations;2.20.8] in [default]
software.amazon.awssdk#http-client-spi;2.18.40 by [software.amazon.awssdk#http-client-spi;2.20.8] in [default]
software.amazon.awssdk#utils;2.18.40 by [software.amazon.awssdk#utils;2.20.8] in [default]
software.amazon.awssdk#auth;2.18.40 by [software.amazon.awssdk#auth;2.20.8] in [default]
software.amazon.awssdk#regions;2.18.40 by [software.amazon.awssdk#regions;2.20.8] in [default]
software.amazon.awssdk#metrics-spi;2.18.40 by [software.amazon.awssdk#metrics-spi;2.20.8] in [default]
software.amazon.awssdk#json-utils;2.18.40 by [software.amazon.awssdk#json-utils;2.20.8] in [default]
software.amazon.awssdk#endpoints-spi;2.18.40 by [software.amazon.awssdk#endpoints-spi;2.20.8] in [default]
software.amazon.awssdk#dynamodb;2.18.40 by [software.amazon.awssdk#dynamodb;2.20.8] in [default]
software.amazon.awssdk#aws-json-protocol;2.18.40 by [software.amazon.awssdk#aws-json-protocol;2.20.8] in [default]
org.apache.httpcomponents#httpcore;4.4.1 by [org.apache.httpcomponents#httpcore;4.4.13] in [default]
software.amazon.awssdk#apache-client;2.18.40 by [software.amazon.awssdk#apache-client;2.20.8] in [default]
software.amazon.awssdk#netty-nio-client;2.18.40 by [software.amazon.awssdk#netty-nio-client;2.20.8] in [default]
commons-codec#commons-codec;1.10 by [commons-codec#commons-codec;1.15] in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 149 | 0 | 0 | 32 || 117 | 0 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-aa8d9c29-7056-4201-b20a-c5f73fac7ea9
confs: [default]
0 artifacts copied, 117 already retrieved (0kB/10ms)
24/06/05 09:14:05 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
+--------+----------+----------+----------+-----------+-----+
| orderID|productSKU|customerID| orderDate|orderAmount|state|
+--------+----------+----------+----------+-----------+-----+
| order1| prod001| cust001|2024-01-15| 150.0| CA|
|order002| prod002| cust002|2024-01-16| 200.0| NY|
|order003| prod003| cust003|2024-01-17| 300.0| TX|
|order004| prod004| cust004|2024-01-18| 250.0| FL|
|order005| prod005| cust005|2024-01-19| 100.0| WA|
|order006| prod006| cust006|2024-01-20| 350.0| CA|
|order007| prod007| cust007|2024-01-21| 400.0| NY|
+--------+----------+----------+----------+-----------+-----+
{'hoodie.table.name': 'orders', 'hoodie.datasource.write.table.type': 'COPY_ON_WRITE', 'hoodie.datasource.write.table.name': 'orders', 'hoodie.datasource.write.operation': 'upsert', 'hoodie.datasource.write.recordkey.field': 'orderID', 'hoodie.datasource.write.precombine.field': 'orderDate', 'hoodie.datasource.write.partitionpath.field': 'state', 'hoodie.write.concurrency.mode': 'optimistic_concurrency_control', 'hoodie.cleaner.policy.failed.writes': 'LAZY', 'hoodie.write.lock.provider': 'org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider', 'hoodie.write.lock.dynamodb.table': 'hudi-lock-table', 'hoodie.write.lock.dynamodb.region': 'us-east-1', 'hoodie.write.lock.dynamodb.endpoint_url': 'dynamodb.us-east-1.amazonaws.com', 'hoodie.write.lock.dynamodb.billing_mode': 'PAY_PER_REQUEST'}
file:///Users/soumilshah/IdeaProjects/SparkProject/tem/database=default/table_name=orders
24/06/05 09:14:12 WARN DFSPropertiesConfiguration: Cannot find HUDI_CONF_DIR, please set it as the dir of hudi-defaults.conf
24/06/05 09:14:12 WARN DFSPropertiesConfiguration: Properties file file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file
Traceback (most recent call last):
File "/Users/soumilshah/IdeaProjects/SparkProject/deltastreamerBroadcastJoins/conflictdetection/w1.py", line 90, in <module>
write_to_hudi(
File "/Users/soumilshah/IdeaProjects/SparkProject/deltastreamerBroadcastJoins/conflictdetection/w1.py", line 88, in write_to_hudi
.save(path)
^^^^^^^^^^
File "/opt/anaconda3/lib/python3.11/site-packages/pyspark/sql/readwriter.py", line 1398, in save
self._jwrite.save(path)
File "/opt/anaconda3/lib/python3.11/site-packages/py4j/java_gateway.py", line 1322, in __call__
return_value = get_return_value(
^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/lib/python3.11/site-packages/pyspark/errors/exceptions/captured.py", line 169, in deco
return f(*a, **kw)
^^^^^^^^^^^
File "/opt/anaconda3/lib/python3.11/site-packages/py4j/protocol.py", line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o65.save.
: org.apache.hudi.exception.HoodieException: Unable to instantiate class org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider
at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:81)
at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:129)
at org.apache.hudi.client.transaction.lock.LockManager.getLockProvider(LockManager.java:118)
at org.apache.hudi.client.transaction.lock.LockManager.lock(LockManager.java:71)
at org.apache.hudi.client.transaction.TransactionManager.beginTransaction(TransactionManager.java:58)
at org.apache.hudi.client.BaseHoodieWriteClient.doInitTable(BaseHoodieWriteClient.java:1253)
at org.apache.hudi.client.BaseHoodieWriteClient.initTable(BaseHoodieWriteClient.java:1296)
at org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:139)
at org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:224)
at org.apache.hudi.HoodieSparkSqlWriter$.writeInternal(HoodieSparkSqlWriter.scala:431)
at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:132)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:150)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:47)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:118)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:31)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488)
at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:94)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:81)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:79)
at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:133)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:856)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:387)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:360)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:79)
... 52 more
Caused by: java.lang.NoSuchMethodError: 'com.amazonaws.services.dynamodbv2.AmazonDynamoDBLockClientOptions$AmazonDynamoDBLockClientOptionsBuilder com.amazonaws.services.dynamodbv2.AmazonDynamoDBLockClientOptions.builder(org.apache.hudi.software.amazon.awssdk.services.dynamodb.DynamoDbClient, java.lang.String)'
at org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.<init>(DynamoDBBasedLockProvider.java:90)
at org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.<init>(DynamoDBBasedLockProvider.java:75)
... 57 more
(base) soumilshah@Soumils-MBP conflictdetection %
@soumilshah1995 Looks like AWS SDK bundle version conflicts with hudi-aws-bundle.
yes I know its some jar issue as usual let me try older version to see if that works I have to just brute force here lol trying different versions
I guess I can close it its mostly on version issue I didn't had a chance yet to try brute force but I know issue is in jar I suppose I can close this
Description: I'm encountering an issue while attempting to use DynamoDB Based Lock with an Apache Hudi PySpark job running locally. The goal is to have the job access DynamoDB in a specified region for locking purposes.
Configuration Used:
Error Logs
Code