apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.2k stars 2.38k forks source link

[SUPPORT] Unable to Use DynamoDB Based Lock with Hudi PySpark Job Locally #11391

Closed soumilshah1995 closed 3 weeks ago

soumilshah1995 commented 1 month ago

Description: I'm encountering an issue while attempting to use DynamoDB Based Lock with an Apache Hudi PySpark job running locally. The goal is to have the job access DynamoDB in a specified region for locking purposes.

Configuration Used:

'hoodie.write.concurrency.mode': 'optimistic_concurrency_control',
'hoodie.cleaner.policy.failed.writes': 'LAZY',
'hoodie.write.lock.provider': 'org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider',
'hoodie.write.lock.dynamodb.table': 'hudi-lock-table',  # DynamoDB table name for locking
'hoodie.write.lock.dynamodb.region': curr_region,  # DynamoDB region
'hoodie.write.lock.dynamodb.endpoint_url': f'dynamodb.{curr_region}.amazonaws.com',
'hoodie.write.lock.dynamodb.billing_mode': 'PAY_PER_REQUEST',

Error Logs

org.apache.hudi.exception.HoodieException: Unable to load class
    at org.apache.hudi.common.util.ReflectionUtils.getClass(ReflectionUtils.java:58)
    ...
Caused by: java.lang.ClassNotFoundException: org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider
    ...

Code

try:
    import os
    import sys
    import uuid
    import pyspark
    import datetime
    from pyspark.sql import SparkSession
    from pyspark import SparkConf, SparkContext
    from faker import Faker
    import datetime
    from datetime import datetime
    import random
    import pandas as pd
    from pyspark.sql.types import StructType, StructField, StringType, DateType, FloatType
    from pyspark.sql.functions import col

    from datetime import datetim

    print("Imports loaded ")

except Exception as e:
    print("error", e)

HUDI_VERSION = '0.14.0'
SPARK_VERSION = '3.4'

os.environ["JAVA_HOME"] = "/opt/homebrew/opt/openjdk@11"
SUBMIT_ARGS = f"--packages org.apache.hudi:hudi-spark{SPARK_VERSION}-bundle_2.12:{HUDI_VERSION} pyspark-shell"
os.environ["PYSPARK_SUBMIT_ARGS"] = SUBMIT_ARGS
os.environ['PYSPARK_PYTHON'] = sys.executable

spark = SparkSession.builder \
    .config('spark.serializer', 'org.apache.spark.serializer.KryoSerializer') \
    .config('spark.sql.extensions', 'org.apache.spark.sql.hudi.HoodieSparkSessionExtension') \
    .config('className', 'org.apache.hudi') \
    .config('spark.sql.hive.convertMetastoreParquet', 'false') \
    .getOrCreate()

schema = StructType([
    StructField("orderID", StringType(), True),
    StructField("productSKU", StringType(), True),
    StructField("customerID", StringType(), True),
    StructField("orderDate", StringType(), True),
    StructField("orderAmount", FloatType(), True),
    StructField("state", StringType(), True)
])

# Create the data array with the additional state value
data = [
    ("order1", "prod001", "cust001", "2024-01-15", 150.00, "CA"),
    ("order002", "prod002", "cust002", "2024-01-16", 200.00, "NY"),
    ("order003", "prod003", "cust003", "2024-01-17", 300.00, "TX"),
    ("order004", "prod004", "cust004", "2024-01-18", 250.00, "FL"),
    ("order005", "prod005", "cust005", "2024-01-19", 100.00, "WA"),
    ("order006", "prod006", "cust006", "2024-01-20", 350.00, "CA"),
    ("order007", "prod007", "cust007", "2024-01-21", 400.00, "NY")
]

# Create the DataFrame
df = spark.createDataFrame(data, schema)

# Show the DataFrame with the new "state" column
df.show()

def write_to_hudi(spark_df,
                  table_name,
                  db_name,
                  method='upsert',
                  table_type='COPY_ON_WRITE',
                  recordkey='',
                  precombine='',
                  partition_fields='',
                  index_type='BLOOM',
                  curr_region='us-east-1'
                  ):
    path = f"file:///Users/soumilshah/IdeaProjects/SparkProject/tem/database={db_name}/table_name={table_name}"

    hudi_options = {
        'hoodie.table.name': table_name,
        'hoodie.datasource.write.table.type': table_type,
        'hoodie.datasource.write.table.name': table_name,
        'hoodie.datasource.write.operation': method,
        'hoodie.datasource.write.recordkey.field': recordkey,
        'hoodie.datasource.write.precombine.field': precombine,
        "hoodie.datasource.write.partitionpath.field": partition_fields,

        'hoodie.write.concurrency.mode': 'optimistic_concurrency_control',
        'hoodie.cleaner.policy.failed.writes': 'LAZY',
        'hoodie.write.lock.provider': 'org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider',
        'hoodie.write.lock.dynamodb.table': 'hudi-lock-table',  # DynamoDB table name for locking
        'hoodie.write.lock.dynamodb.region': curr_region,  # DynamoDB region
        'hoodie.write.lock.dynamodb.endpoint_url': f'dynamodb.{curr_region}.amazonaws.com',
        'hoodie.write.lock.dynamodb.billing_mode': 'PAY_PER_REQUEST',

    }
    print(hudi_options)

    print("\n")
    print(path)
    print("\n")

    spark_df.write.format("hudi"). \
        options(**hudi_options). \
        mode("append"). \
        save(path)

write_to_hudi(
    spark_df=df,
    db_name="default",
    table_name="orders",
    recordkey="orderID",
    precombine="orderDate",
    partition_fields="state",
    index_type="RECORD_INDEX"
)
soumilshah1995 commented 1 month ago

tried following

os.environ["JAVA_HOME"] = "/opt/homebrew/opt/openjdk@11"
AWS_SDK_DEPENDENCIES = "com.amazonaws:dynamodb-lock-client:1.2.0,com.amazonaws:aws-java-sdk-dynamodb:1.12.735,com.amazonaws:aws-java-sdk-core:1.12.735"

SUBMIT_ARGS = f"--packages org.apache.hudi:hudi-spark{SPARK_VERSION}-bundle_2.12:{HUDI_VERSION},{AWS_SDK_DEPENDENCIES} pyspark-shell"

os.environ["PYSPARK_SUBMIT_ARGS"] = SUBMIT_ARGS
os.environ['PYSPARK_PYTHON'] = sys.executable

spark = SparkSession.builder \
    .config('spark.serializer', 'org.apache.spark.serializer.KryoSerializer') \
    .config('spark.sql.extensions', 'org.apache.spark.sql.hudi.HoodieSparkSessionExtension') \
    .config('className', 'org.apache.hudi') \
    .config('spark.sql.hive.convertMetastoreParquet', 'false') \
    .getOrCreate()

looks like there is class missing not sure which one

4j.protocol.Py4JJavaError: An error occurred while calling o65.save.
: org.apache.hudi.exception.HoodieException: Unable to load class
    at org.apache.hudi.common.util.ReflectionUtils.getClass(ReflectionUtils.java:58)
    at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:79)
ad1happy2go commented 1 month ago

@soumilshah1995 you need to add hudi-aws bundle also in the dependencies.

soumilshah1995 commented 1 month ago

oh let me try this and update the thread shortly

soumilshah1995 commented 1 month ago

Code


HUDI_VERSION = '0.14.0'
SPARK_VERSION = '3.4'

os.environ["JAVA_HOME"] = "/opt/homebrew/opt/openjdk@11"

AWS_JAR_FILES = f"org.apache.hudi:hudi-aws:{HUDI_VERSION},org.apache.hudi:hudi-aws-bundle:{HUDI_VERSION}"
SUBMIT_ARGS = f"--packages org.apache.hudi:hudi-spark3.4.1-bundle_2.12:{HUDI_VERSION},{AWS_JAR_FILES} pyspark-shell"

os.environ["PYSPARK_SUBMIT_ARGS"] = SUBMIT_ARGS
os.environ['PYSPARK_PYTHON'] = sys.executable

spark = SparkSession.builder \
    .config('spark.serializer', 'org.apache.spark.serializer.KryoSerializer') \
    .config('spark.sql.extensions', 'org.apache.spark.sql.hudi.HoodieSparkSessionExtension') \
    .config('className', 'org.apache.hudi') \
    .config('spark.sql.hive.convertMetastoreParquet', 'false') \
    .getOrCreate()

Error


 python3 w1.py
Imports loaded successfully.
Warning: Ignoring non-Spark config property: className
:: loading settings :: url = jar:file:/opt/anaconda3/lib/python3.11/site-packages/pyspark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /Users/soumilshah/.ivy2/cache
The jars for the packages stored in: /Users/soumilshah/.ivy2/jars
org.apache.hudi#hudi-spark3.4.1-bundle_2.12 added as a dependency
org.apache.hudi#hudi-aws added as a dependency
org.apache.hudi#hudi-aws-bundle added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-9c6c8274-f28f-4a73-b9e9-c27219acefce;1.0
    confs: [default]
    found org.apache.hudi#hudi-aws;0.14.0 in central
    found org.apache.hudi#hudi-common;0.14.0 in central
    found org.openjdk.jol#jol-core;0.16 in local-m2-cache
    found com.fasterxml.jackson.core#jackson-annotations;2.10.0 in local-m2-cache
    found com.fasterxml.jackson.core#jackson-databind;2.10.0 in local-m2-cache
    found com.fasterxml.jackson.core#jackson-core;2.10.0 in local-m2-cache
    found com.fasterxml.jackson.datatype#jackson-datatype-jsr310;2.10.0 in local-m2-cache
    found com.github.ben-manes.caffeine#caffeine;2.9.1 in local-m2-cache
    found org.checkerframework#checker-qual;3.10.0 in local-m2-cache
    found com.google.errorprone#error_prone_annotations;2.5.1 in local-m2-cache
    found org.apache.orc#orc-core;1.6.0 in local-m2-cache
    found org.apache.orc#orc-shims;1.6.0 in local-m2-cache
    found org.slf4j#slf4j-api;1.7.36 in local-m2-cache
    found com.google.protobuf#protobuf-java;3.21.7 in local-m2-cache
    found commons-lang#commons-lang;2.6 in local-m2-cache
    found io.airlift#aircompressor;0.15 in local-m2-cache
    found javax.xml.bind#jaxb-api;2.2.11 in local-m2-cache
    found org.apache.hive#hive-storage-api;2.6.0 in local-m2-cache
    found org.jetbrains#annotations;17.0.0 in local-m2-cache
    found org.roaringbitmap#RoaringBitmap;0.9.47 in local-m2-cache
    found org.apache.httpcomponents#fluent-hc;4.4.1 in local-m2-cache
    found commons-logging#commons-logging;1.2 in local-m2-cache
    found org.rocksdb#rocksdbjni;7.5.3 in local-m2-cache
    found org.apache.hbase#hbase-client;2.4.9 in local-m2-cache
    found org.apache.hbase.thirdparty#hbase-shaded-protobuf;3.5.1 in local-m2-cache
    found org.apache.hbase#hbase-protocol-shaded;2.4.9 in local-m2-cache
    found org.apache.yetus#audience-annotations;0.5.0 in local-m2-cache
    found org.apache.hbase#hbase-protocol;2.4.9 in local-m2-cache
    found javax.annotation#javax.annotation-api;1.2 in local-m2-cache
    found commons-codec#commons-codec;1.13 in local-m2-cache
    found commons-io#commons-io;2.11.0 in local-m2-cache
    found org.apache.commons#commons-lang3;3.9 in local-m2-cache
    found org.apache.hbase.thirdparty#hbase-shaded-miscellaneous;3.5.1 in local-m2-cache
    found com.google.errorprone#error_prone_annotations;2.7.1 in local-m2-cache
    found org.apache.hbase.thirdparty#hbase-shaded-netty;3.5.1 in local-m2-cache
    found org.apache.zookeeper#zookeeper;3.5.7 in local-m2-cache
    found org.apache.zookeeper#zookeeper-jute;3.5.7 in local-m2-cache
    found io.netty#netty-handler;4.1.45.Final in local-m2-cache
    found io.netty#netty-common;4.1.45.Final in local-m2-cache
    found io.netty#netty-buffer;4.1.45.Final in local-m2-cache
    found io.netty#netty-transport;4.1.45.Final in local-m2-cache
    found io.netty#netty-resolver;4.1.45.Final in local-m2-cache
    found io.netty#netty-codec;4.1.45.Final in local-m2-cache
    found io.netty#netty-transport-native-epoll;4.1.45.Final in local-m2-cache
    found io.netty#netty-transport-native-unix-common;4.1.45.Final in local-m2-cache
    found org.apache.htrace#htrace-core4;4.2.0-incubating in local-m2-cache
    found org.jruby.jcodings#jcodings;1.0.55 in local-m2-cache
    found org.jruby.joni#joni;2.1.31 in local-m2-cache
    found io.dropwizard.metrics#metrics-core;4.1.1 in local-m2-cache
    found org.apache.commons#commons-crypto;1.0.0 in local-m2-cache
    found org.apache.hadoop#hadoop-auth;2.10.1 in central
    found com.nimbusds#nimbus-jose-jwt;7.9 in local-m2-cache
    found com.github.stephenc.jcip#jcip-annotations;1.0-1 in local-m2-cache
    found org.apache.directory.server#apacheds-kerberos-codec;2.0.0-M15 in local-m2-cache
    found org.apache.directory.server#apacheds-i18n;2.0.0-M15 in local-m2-cache
    found org.apache.directory.api#api-asn1-api;1.0.0-M20 in local-m2-cache
    found org.apache.directory.api#api-util;1.0.0-M20 in local-m2-cache
    found org.apache.curator#curator-framework;2.7.1 in local-m2-cache
    found org.apache.curator#curator-client;2.7.1 in local-m2-cache
    found org.apache.hbase#hbase-server;2.4.9 in local-m2-cache
    found org.apache.hbase#hbase-procedure;2.4.9 in local-m2-cache
    found org.apache.hbase#hbase-replication;2.4.9 in local-m2-cache
    found org.glassfish.web#javax.servlet.jsp;2.3.2 in local-m2-cache
    found org.glassfish#javax.el;3.0.1-b12 in local-m2-cache
    found javax.servlet.jsp#javax.servlet.jsp-api;2.3.1 in local-m2-cache
    found org.apache.commons#commons-math3;3.6.1 in local-m2-cache
    found org.jamon#jamon-runtime;2.4.1 in local-m2-cache
    found com.lmax#disruptor;3.4.2 in local-m2-cache
    found org.apache.hadoop#hadoop-distcp;2.10.0 in local-m2-cache
    found org.apache.hadoop#hadoop-annotations;2.10.0 in local-m2-cache
    found org.apache.hadoop#hadoop-mapreduce-client-core;2.10.1 in central
    found org.apache.hadoop#hadoop-yarn-client;2.10.1 in central
    found commons-cli#commons-cli;1.2 in local-m2-cache
    found log4j#log4j;1.2.17 in local-m2-cache
    found org.apache.hadoop#hadoop-yarn-api;2.10.1 in central
    found org.apache.hadoop#hadoop-yarn-common;2.10.1 in central
    found org.apache.commons#commons-compress;1.19 in local-m2-cache
    found com.sun.jersey#jersey-core;1.9 in local-m2-cache
    found com.sun.jersey#jersey-client;1.9 in local-m2-cache
    found com.google.inject.extensions#guice-servlet;3.0 in local-m2-cache
    found com.google.inject#guice;3.0 in local-m2-cache
    found javax.inject#javax.inject;1 in local-m2-cache
    found aopalliance#aopalliance;1.0 in local-m2-cache
    found org.sonatype.sisu.inject#cglib;2.2.1-v20090111 in central
    found asm#asm;3.2 in central
    found com.sun.jersey#jersey-server;1.9 in local-m2-cache
    found com.sun.jersey#jersey-json;1.9 in local-m2-cache
    found org.codehaus.jettison#jettison;1.3.8 in central
    found com.sun.xml.bind#jaxb-impl;2.2.3-1 in local-m2-cache
    found com.sun.jersey.contribs#jersey-guice;1.9 in local-m2-cache
    found org.apache.avro#avro;1.8.2 in local-m2-cache
    found com.thoughtworks.paranamer#paranamer;2.7 in local-m2-cache
    found org.xerial.snappy#snappy-java;1.1.8.3 in local-m2-cache
    found org.tukaani#xz;1.5 in local-m2-cache
    found org.slf4j#slf4j-log4j12;1.7.30 in central
    found io.netty#netty;3.10.6.Final in local-m2-cache
    found org.lz4#lz4-java;1.8.0 in local-m2-cache
    found org.roaringbitmap#shims;0.9.47 in local-m2-cache
    found org.apache.hudi#hudi-hive-sync;0.14.0 in central
    found org.apache.hudi#hudi-hadoop-mr;0.14.0 in central
    found org.apache.hudi#hudi-sync-common;0.14.0 in central
    found com.beust#jcommander;1.78 in central
    found com.amazonaws#dynamodb-lock-client;1.2.0 in central
    found software.amazon.awssdk#cloudwatch;2.18.40 in central
    found software.amazon.awssdk#aws-query-protocol;2.18.40 in central
    found software.amazon.awssdk#protocol-core;2.18.40 in central
    found software.amazon.awssdk#sdk-core;2.18.40 in central
    found software.amazon.awssdk#annotations;2.18.40 in central
    found software.amazon.awssdk#http-client-spi;2.18.40 in central
    found software.amazon.awssdk#utils;2.18.40 in central
    found org.reactivestreams#reactive-streams;1.0.3 in local-m2-cache
    found software.amazon.awssdk#metrics-spi;2.18.40 in central
    found software.amazon.awssdk#endpoints-spi;2.18.40 in central
    found software.amazon.awssdk#profiles;2.18.40 in central
    found software.amazon.awssdk#aws-core;2.18.40 in central
    found software.amazon.awssdk#regions;2.18.40 in central
    found software.amazon.awssdk#json-utils;2.18.40 in central
    found software.amazon.awssdk#third-party-jackson-core;2.18.40 in central
    found software.amazon.awssdk#auth;2.18.40 in central
    found software.amazon.eventstream#eventstream;1.0.1 in central
    found software.amazon.awssdk#apache-client;2.18.40 in central
    found software.amazon.awssdk#netty-nio-client;2.18.40 in central
    found io.netty#netty-codec-http;4.1.77.Final in local-m2-cache
    found io.netty#netty-common;4.1.77.Final in local-m2-cache
    found io.netty#netty-buffer;4.1.77.Final in local-m2-cache
    found io.netty#netty-transport;4.1.77.Final in local-m2-cache
    found io.netty#netty-resolver;4.1.77.Final in local-m2-cache
    found io.netty#netty-codec;4.1.77.Final in local-m2-cache
    found io.netty#netty-handler;4.1.77.Final in local-m2-cache
    found io.netty#netty-codec-http2;4.1.77.Final in local-m2-cache
    found io.netty#netty-transport-classes-epoll;4.1.77.Final in local-m2-cache
    found io.netty#netty-transport-native-unix-common;4.1.77.Final in local-m2-cache
    found software.amazon.awssdk#dynamodb;2.18.40 in central
    found software.amazon.awssdk#aws-json-protocol;2.18.40 in central
    found software.amazon.awssdk#glue;2.18.40 in central
    found software.amazon.awssdk#sqs;2.18.40 in central
    found org.apache.httpcomponents#httpclient;4.5.13 in local-m2-cache
    found org.apache.httpcomponents#httpcore;4.4.13 in local-m2-cache
    found org.apache.hudi#hudi-aws-bundle;0.14.0 in central
    found org.apache.parquet#parquet-avro;1.10.1 in local-m2-cache
    found org.apache.parquet#parquet-column;1.10.1 in local-m2-cache
    found org.apache.parquet#parquet-common;1.10.1 in local-m2-cache
    found org.apache.parquet#parquet-format;2.4.0 in local-m2-cache
    found org.apache.parquet#parquet-encoding;1.10.1 in local-m2-cache
    found org.apache.parquet#parquet-hadoop;1.10.1 in local-m2-cache
    found org.apache.parquet#parquet-jackson;1.10.1 in local-m2-cache
    found org.codehaus.jackson#jackson-mapper-asl;1.9.13 in local-m2-cache
    found org.codehaus.jackson#jackson-core-asl;1.9.13 in local-m2-cache
    found commons-pool#commons-pool;1.6 in local-m2-cache
    found it.unimi.dsi#fastutil;7.0.13 in local-m2-cache
:: resolution report :: resolve 1934ms :: artifacts dl 30ms
    :: modules in use:
    aopalliance#aopalliance;1.0 from local-m2-cache in [default]
    asm#asm;3.2 from central in [default]
    com.amazonaws#dynamodb-lock-client;1.2.0 from central in [default]
    com.beust#jcommander;1.78 from central in [default]
    com.fasterxml.jackson.core#jackson-annotations;2.10.0 from local-m2-cache in [default]
    com.fasterxml.jackson.core#jackson-core;2.10.0 from local-m2-cache in [default]
    com.fasterxml.jackson.core#jackson-databind;2.10.0 from local-m2-cache in [default]
    com.fasterxml.jackson.datatype#jackson-datatype-jsr310;2.10.0 from local-m2-cache in [default]
    com.github.ben-manes.caffeine#caffeine;2.9.1 from local-m2-cache in [default]
    com.github.stephenc.jcip#jcip-annotations;1.0-1 from local-m2-cache in [default]
    com.google.errorprone#error_prone_annotations;2.7.1 from local-m2-cache in [default]
    com.google.inject#guice;3.0 from local-m2-cache in [default]
    com.google.inject.extensions#guice-servlet;3.0 from local-m2-cache in [default]
    com.google.protobuf#protobuf-java;3.21.7 from local-m2-cache in [default]
    com.lmax#disruptor;3.4.2 from local-m2-cache in [default]
    com.nimbusds#nimbus-jose-jwt;7.9 from local-m2-cache in [default]
    com.sun.jersey#jersey-client;1.9 from local-m2-cache in [default]
    com.sun.jersey#jersey-core;1.9 from local-m2-cache in [default]
    com.sun.jersey#jersey-json;1.9 from local-m2-cache in [default]
    com.sun.jersey#jersey-server;1.9 from local-m2-cache in [default]
    com.sun.jersey.contribs#jersey-guice;1.9 from local-m2-cache in [default]
    com.sun.xml.bind#jaxb-impl;2.2.3-1 from local-m2-cache in [default]
    com.thoughtworks.paranamer#paranamer;2.7 from local-m2-cache in [default]
    commons-cli#commons-cli;1.2 from local-m2-cache in [default]
    commons-codec#commons-codec;1.13 from local-m2-cache in [default]
    commons-io#commons-io;2.11.0 from local-m2-cache in [default]
    commons-lang#commons-lang;2.6 from local-m2-cache in [default]
    commons-logging#commons-logging;1.2 from local-m2-cache in [default]
    commons-pool#commons-pool;1.6 from local-m2-cache in [default]
    io.airlift#aircompressor;0.15 from local-m2-cache in [default]
    io.dropwizard.metrics#metrics-core;4.1.1 from local-m2-cache in [default]
    io.netty#netty;3.10.6.Final from local-m2-cache in [default]
    io.netty#netty-buffer;4.1.77.Final from local-m2-cache in [default]
    io.netty#netty-codec;4.1.77.Final from local-m2-cache in [default]
    io.netty#netty-codec-http;4.1.77.Final from local-m2-cache in [default]
    io.netty#netty-codec-http2;4.1.77.Final from local-m2-cache in [default]
    io.netty#netty-common;4.1.77.Final from local-m2-cache in [default]
    io.netty#netty-handler;4.1.77.Final from local-m2-cache in [default]
    io.netty#netty-resolver;4.1.77.Final from local-m2-cache in [default]
    io.netty#netty-transport;4.1.77.Final from local-m2-cache in [default]
    io.netty#netty-transport-classes-epoll;4.1.77.Final from local-m2-cache in [default]
    io.netty#netty-transport-native-epoll;4.1.45.Final from local-m2-cache in [default]
    io.netty#netty-transport-native-unix-common;4.1.77.Final from local-m2-cache in [default]
    it.unimi.dsi#fastutil;7.0.13 from local-m2-cache in [default]
    javax.annotation#javax.annotation-api;1.2 from local-m2-cache in [default]
    javax.inject#javax.inject;1 from local-m2-cache in [default]
    javax.servlet.jsp#javax.servlet.jsp-api;2.3.1 from local-m2-cache in [default]
    javax.xml.bind#jaxb-api;2.2.11 from local-m2-cache in [default]
    log4j#log4j;1.2.17 from local-m2-cache in [default]
    org.apache.avro#avro;1.8.2 from local-m2-cache in [default]
    org.apache.commons#commons-compress;1.19 from local-m2-cache in [default]
    org.apache.commons#commons-crypto;1.0.0 from local-m2-cache in [default]
    org.apache.commons#commons-lang3;3.9 from local-m2-cache in [default]
    org.apache.commons#commons-math3;3.6.1 from local-m2-cache in [default]
    org.apache.curator#curator-client;2.7.1 from local-m2-cache in [default]
    org.apache.curator#curator-framework;2.7.1 from local-m2-cache in [default]
    org.apache.directory.api#api-asn1-api;1.0.0-M20 from local-m2-cache in [default]
    org.apache.directory.api#api-util;1.0.0-M20 from local-m2-cache in [default]
    org.apache.directory.server#apacheds-i18n;2.0.0-M15 from local-m2-cache in [default]
    org.apache.directory.server#apacheds-kerberos-codec;2.0.0-M15 from local-m2-cache in [default]
    org.apache.hadoop#hadoop-annotations;2.10.0 from local-m2-cache in [default]
    org.apache.hadoop#hadoop-auth;2.10.1 from central in [default]
    org.apache.hadoop#hadoop-distcp;2.10.0 from local-m2-cache in [default]
    org.apache.hadoop#hadoop-mapreduce-client-core;2.10.1 from central in [default]
    org.apache.hadoop#hadoop-yarn-api;2.10.1 from central in [default]
    org.apache.hadoop#hadoop-yarn-client;2.10.1 from central in [default]
    org.apache.hadoop#hadoop-yarn-common;2.10.1 from central in [default]
    org.apache.hbase#hbase-client;2.4.9 from local-m2-cache in [default]
    org.apache.hbase#hbase-procedure;2.4.9 from local-m2-cache in [default]
    org.apache.hbase#hbase-protocol;2.4.9 from local-m2-cache in [default]
    org.apache.hbase#hbase-protocol-shaded;2.4.9 from local-m2-cache in [default]
    org.apache.hbase#hbase-replication;2.4.9 from local-m2-cache in [default]
    org.apache.hbase#hbase-server;2.4.9 from local-m2-cache in [default]
    org.apache.hbase.thirdparty#hbase-shaded-miscellaneous;3.5.1 from local-m2-cache in [default]
    org.apache.hbase.thirdparty#hbase-shaded-netty;3.5.1 from local-m2-cache in [default]
    org.apache.hbase.thirdparty#hbase-shaded-protobuf;3.5.1 from local-m2-cache in [default]
    org.apache.hive#hive-storage-api;2.6.0 from local-m2-cache in [default]
    org.apache.htrace#htrace-core4;4.2.0-incubating from local-m2-cache in [default]
    org.apache.httpcomponents#fluent-hc;4.4.1 from local-m2-cache in [default]
    org.apache.httpcomponents#httpclient;4.5.13 from local-m2-cache in [default]
    org.apache.httpcomponents#httpcore;4.4.13 from local-m2-cache in [default]
    org.apache.hudi#hudi-aws;0.14.0 from central in [default]
    org.apache.hudi#hudi-aws-bundle;0.14.0 from central in [default]
    org.apache.hudi#hudi-common;0.14.0 from central in [default]
    org.apache.hudi#hudi-hadoop-mr;0.14.0 from central in [default]
    org.apache.hudi#hudi-hive-sync;0.14.0 from central in [default]
    org.apache.hudi#hudi-sync-common;0.14.0 from central in [default]
    org.apache.orc#orc-core;1.6.0 from local-m2-cache in [default]
    org.apache.orc#orc-shims;1.6.0 from local-m2-cache in [default]
    org.apache.parquet#parquet-avro;1.10.1 from local-m2-cache in [default]
    org.apache.parquet#parquet-column;1.10.1 from local-m2-cache in [default]
    org.apache.parquet#parquet-common;1.10.1 from local-m2-cache in [default]
    org.apache.parquet#parquet-encoding;1.10.1 from local-m2-cache in [default]
    org.apache.parquet#parquet-format;2.4.0 from local-m2-cache in [default]
    org.apache.parquet#parquet-hadoop;1.10.1 from local-m2-cache in [default]
    org.apache.parquet#parquet-jackson;1.10.1 from local-m2-cache in [default]
    org.apache.yetus#audience-annotations;0.5.0 from local-m2-cache in [default]
    org.apache.zookeeper#zookeeper;3.5.7 from local-m2-cache in [default]
    org.apache.zookeeper#zookeeper-jute;3.5.7 from local-m2-cache in [default]
    org.checkerframework#checker-qual;3.10.0 from local-m2-cache in [default]
    org.codehaus.jackson#jackson-core-asl;1.9.13 from local-m2-cache in [default]
    org.codehaus.jackson#jackson-mapper-asl;1.9.13 from local-m2-cache in [default]
    org.codehaus.jettison#jettison;1.3.8 from central in [default]
    org.glassfish#javax.el;3.0.1-b12 from local-m2-cache in [default]
    org.glassfish.web#javax.servlet.jsp;2.3.2 from local-m2-cache in [default]
    org.jamon#jamon-runtime;2.4.1 from local-m2-cache in [default]
    org.jetbrains#annotations;17.0.0 from local-m2-cache in [default]
    org.jruby.jcodings#jcodings;1.0.55 from local-m2-cache in [default]
    org.jruby.joni#joni;2.1.31 from local-m2-cache in [default]
    org.lz4#lz4-java;1.8.0 from local-m2-cache in [default]
    org.openjdk.jol#jol-core;0.16 from local-m2-cache in [default]
    org.reactivestreams#reactive-streams;1.0.3 from local-m2-cache in [default]
    org.roaringbitmap#RoaringBitmap;0.9.47 from local-m2-cache in [default]
    org.roaringbitmap#shims;0.9.47 from local-m2-cache in [default]
    org.rocksdb#rocksdbjni;7.5.3 from local-m2-cache in [default]
    org.slf4j#slf4j-api;1.7.36 from local-m2-cache in [default]
    org.slf4j#slf4j-log4j12;1.7.30 from central in [default]
    org.sonatype.sisu.inject#cglib;2.2.1-v20090111 from central in [default]
    org.tukaani#xz;1.5 from local-m2-cache in [default]
    org.xerial.snappy#snappy-java;1.1.8.3 from local-m2-cache in [default]
    software.amazon.awssdk#annotations;2.18.40 from central in [default]
    software.amazon.awssdk#apache-client;2.18.40 from central in [default]
    software.amazon.awssdk#auth;2.18.40 from central in [default]
    software.amazon.awssdk#aws-core;2.18.40 from central in [default]
    software.amazon.awssdk#aws-json-protocol;2.18.40 from central in [default]
    software.amazon.awssdk#aws-query-protocol;2.18.40 from central in [default]
    software.amazon.awssdk#cloudwatch;2.18.40 from central in [default]
    software.amazon.awssdk#dynamodb;2.18.40 from central in [default]
    software.amazon.awssdk#endpoints-spi;2.18.40 from central in [default]
    software.amazon.awssdk#glue;2.18.40 from central in [default]
    software.amazon.awssdk#http-client-spi;2.18.40 from central in [default]
    software.amazon.awssdk#json-utils;2.18.40 from central in [default]
    software.amazon.awssdk#metrics-spi;2.18.40 from central in [default]
    software.amazon.awssdk#netty-nio-client;2.18.40 from central in [default]
    software.amazon.awssdk#profiles;2.18.40 from central in [default]
    software.amazon.awssdk#protocol-core;2.18.40 from central in [default]
    software.amazon.awssdk#regions;2.18.40 from central in [default]
    software.amazon.awssdk#sdk-core;2.18.40 from central in [default]
    software.amazon.awssdk#sqs;2.18.40 from central in [default]
    software.amazon.awssdk#third-party-jackson-core;2.18.40 from central in [default]
    software.amazon.awssdk#utils;2.18.40 from central in [default]
    software.amazon.eventstream#eventstream;1.0.1 from central in [default]
    :: evicted modules:
    com.google.errorprone#error_prone_annotations;2.5.1 by [com.google.errorprone#error_prone_annotations;2.7.1] in [default]
    org.apache.httpcomponents#httpclient;4.4.1 by [org.apache.httpcomponents#httpclient;4.5.13] in [default]
    io.netty#netty-handler;4.1.45.Final by [io.netty#netty-handler;4.1.77.Final] in [default]
    io.netty#netty-common;4.1.45.Final by [io.netty#netty-common;4.1.77.Final] in [default]
    io.netty#netty-buffer;4.1.45.Final by [io.netty#netty-buffer;4.1.77.Final] in [default]
    io.netty#netty-transport;4.1.45.Final by [io.netty#netty-transport;4.1.77.Final] in [default]
    io.netty#netty-resolver;4.1.45.Final by [io.netty#netty-resolver;4.1.77.Final] in [default]
    io.netty#netty-codec;4.1.45.Final by [io.netty#netty-codec;4.1.77.Final] in [default]
    io.netty#netty-transport-native-unix-common;4.1.45.Final by [io.netty#netty-transport-native-unix-common;4.1.77.Final] in [default]
    software.amazon.awssdk#dynamodb;2.20.8 by [software.amazon.awssdk#dynamodb;2.18.40] in [default]
    software.amazon.awssdk#sdk-core;2.20.8 by [software.amazon.awssdk#sdk-core;2.18.40] in [default]
    software.amazon.awssdk#annotations;2.20.8 by [software.amazon.awssdk#annotations;2.18.40] in [default]
    software.amazon.awssdk#auth;2.20.8 by [software.amazon.awssdk#auth;2.18.40] in [default]
    software.amazon.awssdk#regions;2.20.8 by [software.amazon.awssdk#regions;2.18.40] in [default]
    software.amazon.awssdk#http-client-spi;2.20.8 by [software.amazon.awssdk#http-client-spi;2.18.40] in [default]
    software.amazon.awssdk#aws-core;2.20.8 by [software.amazon.awssdk#aws-core;2.18.40] in [default]
    org.apache.httpcomponents#httpcore;4.4.1 by [org.apache.httpcomponents#httpcore;4.4.13] in [default]
    commons-codec#commons-codec;1.11 by [commons-codec#commons-codec;1.13] in [default]
    commons-codec#commons-codec;1.10 by [commons-codec#commons-codec;1.13] in [default]
    ---------------------------------------------------------------------
    |                  |            modules            ||   artifacts   |
    |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
    ---------------------------------------------------------------------
    |      default     |  162  |   0   |   0   |   19  ||  142  |   0   |
    ---------------------------------------------------------------------

:: problems summary ::
:::: WARNINGS
        module not found: org.apache.hudi#hudi-spark3.4.1-bundle_2.12;0.14.0

    ==== local-m2-cache: tried

      file:/Users/soumilshah/.m2/repository/org/apache/hudi/hudi-spark3.4.1-bundle_2.12/0.14.0/hudi-spark3.4.1-bundle_2.12-0.14.0.pom

      -- artifact org.apache.hudi#hudi-spark3.4.1-bundle_2.12;0.14.0!hudi-spark3.4.1-bundle_2.12.jar:

      file:/Users/soumilshah/.m2/repository/org/apache/hudi/hudi-spark3.4.1-bundle_2.12/0.14.0/hudi-spark3.4.1-bundle_2.12-0.14.0.jar

    ==== local-ivy-cache: tried

      /Users/soumilshah/.ivy2/local/org.apache.hudi/hudi-spark3.4.1-bundle_2.12/0.14.0/ivys/ivy.xml

      -- artifact org.apache.hudi#hudi-spark3.4.1-bundle_2.12;0.14.0!hudi-spark3.4.1-bundle_2.12.jar:

      /Users/soumilshah/.ivy2/local/org.apache.hudi/hudi-spark3.4.1-bundle_2.12/0.14.0/jars/hudi-spark3.4.1-bundle_2.12.jar

    ==== central: tried

      https://repo1.maven.org/maven2/org/apache/hudi/hudi-spark3.4.1-bundle_2.12/0.14.0/hudi-spark3.4.1-bundle_2.12-0.14.0.pom

      -- artifact org.apache.hudi#hudi-spark3.4.1-bundle_2.12;0.14.0!hudi-spark3.4.1-bundle_2.12.jar:

      https://repo1.maven.org/maven2/org/apache/hudi/hudi-spark3.4.1-bundle_2.12/0.14.0/hudi-spark3.4.1-bundle_2.12-0.14.0.jar

    ==== spark-packages: tried

      https://repos.spark-packages.org/org/apache/hudi/hudi-spark3.4.1-bundle_2.12/0.14.0/hudi-spark3.4.1-bundle_2.12-0.14.0.pom

      -- artifact org.apache.hudi#hudi-spark3.4.1-bundle_2.12;0.14.0!hudi-spark3.4.1-bundle_2.12.jar:

      https://repos.spark-packages.org/org/apache/hudi/hudi-spark3.4.1-bundle_2.12/0.14.0/hudi-spark3.4.1-bundle_2.12-0.14.0.jar

        [NOT FOUND  ] commons-codec#commons-codec;1.13!commons-codec.jar (0ms)

    ==== local-m2-cache: tried

      file:/Users/soumilshah/.m2/repository/commons-codec/commons-codec/1.13/commons-codec-1.13.jar

        ::::::::::::::::::::::::::::::::::::::::::::::

        ::          UNRESOLVED DEPENDENCIES         ::

        ::::::::::::::::::::::::::::::::::::::::::::::

        :: org.apache.hudi#hudi-spark3.4.1-bundle_2.12;0.14.0: not found

        ::::::::::::::::::::::::::::::::::::::::::::::

        ::::::::::::::::::::::::::::::::::::::::::::::

        ::              FAILED DOWNLOADS            ::

        :: ^ see resolution messages for details  ^ ::

        ::::::::::::::::::::::::::::::::::::::::::::::

        :: commons-codec#commons-codec;1.13!commons-codec.jar

        ::::::::::::::::::::::::::::::::::::::::::::::

:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: org.apache.hudi#hudi-spark3.4.1-bundle_2.12;0.14.0: not found, download failed: commons-codec#commons-codec;1.13!commons-codec.jar]
    at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1528)
    at org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:185)
    at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:332)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1111)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Traceback (most recent call last):
  File "/Users/soumilshah/IdeaProjects/SparkProject/deltastreamerBroadcastJoins/conflictdetection/w1.py", line 36, in <module>
    .getOrCreate()
     ^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/pyspark/sql/session.py", line 477, in getOrCreate
    sc = SparkContext.getOrCreate(sparkConf)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/pyspark/context.py", line 512, in getOrCreate
    SparkContext(conf=conf or SparkConf())
  File "/opt/anaconda3/lib/python3.11/site-packages/pyspark/context.py", line 198, in __init__
    SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
  File "/opt/anaconda3/lib/python3.11/site-packages/pyspark/context.py", line 432, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway(conf)
                                       ^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/pyspark/java_gateway.py", line 106, in launch_gateway
    raise RuntimeError("Java gateway process exited before sending its port number")
RuntimeError: Java gateway process exited before sending its port number
(base) soumilshah@Soumils-MBP conflictdetection % 

am I missing any other packages ?

soumilshah1995 commented 1 month ago

Added following packages


HUDI_VERSION = '0.14.0'
SPARK_VERSION = '3.4'

os.environ["JAVA_HOME"] = "/opt/homebrew/opt/openjdk@11"

SUBMIT_ARGS = f"--packages org.apache.hudi:hudi-spark{SPARK_VERSION}-bundle_2.12:{HUDI_VERSION},com.amazonaws:dynamodb-lock-client:1.2.0,com.amazonaws:aws-java-sdk-dynamodb:1.12.735,com.amazonaws:aws-java-sdk-core:1.12.735,org.apache.hudi:hudi-aws-bundle:{HUDI_VERSION},org.apache.hudi:hudi-aws:{HUDI_VERSION} pyspark-shell"

os.environ["PYSPARK_SUBMIT_ARGS"] = SUBMIT_ARGS
os.environ['PYSPARK_PYTHON'] = sys.executable

spark = SparkSession.builder \
    .config('spark.serializer', 'org.apache.spark.serializer.KryoSerializer') \
    .config('spark.sql.extensions', 'org.apache.spark.sql.hudi.HoodieSparkSessionExtension') \
    .config('className', 'org.apache.hudi') \
    .config('spark.sql.hive.convertMetastoreParquet', 'false') \
    .getOrCreate()

Error : org.apache.hudi.exception.HoodieException: Unable to instantiate class org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider

g.apache.hudi#hudi-aws-bundle added as a dependency
org.apache.hudi#hudi-aws added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-aa8d9c29-7056-4201-b20a-c5f73fac7ea9;1.0
    confs: [default]
    found org.apache.hudi#hudi-spark3.4-bundle_2.12;0.14.0 in spark-list
    found com.amazonaws#dynamodb-lock-client;1.2.0 in central
    found software.amazon.awssdk#dynamodb;2.20.8 in central
    found software.amazon.awssdk#aws-json-protocol;2.20.8 in central
    found software.amazon.awssdk#aws-core;2.20.8 in central
    found software.amazon.awssdk#annotations;2.20.8 in central
    found software.amazon.awssdk#regions;2.20.8 in central
    found software.amazon.awssdk#utils;2.20.8 in central
    found org.reactivestreams#reactive-streams;1.0.2 in central
    found org.slf4j#slf4j-api;1.7.30 in local-m2-cache
    found software.amazon.awssdk#sdk-core;2.20.8 in central
    found software.amazon.awssdk#http-client-spi;2.20.8 in central
    found software.amazon.awssdk#metrics-spi;2.20.8 in central
    found software.amazon.awssdk#endpoints-spi;2.20.8 in central
    found software.amazon.awssdk#profiles;2.20.8 in central
    found software.amazon.awssdk#json-utils;2.20.8 in central
    found software.amazon.awssdk#third-party-jackson-core;2.20.8 in central
    found software.amazon.awssdk#auth;2.20.8 in central
    found software.amazon.eventstream#eventstream;1.0.1 in central
    found software.amazon.awssdk#protocol-core;2.20.8 in central
    found software.amazon.awssdk#apache-client;2.20.8 in central
    found org.apache.httpcomponents#httpclient;4.5.13 in local-m2-cache
    found org.apache.httpcomponents#httpcore;4.4.13 in local-m2-cache
    found commons-logging#commons-logging;1.2 in local-m2-cache
    found software.amazon.awssdk#netty-nio-client;2.20.8 in central
    found io.netty#netty-codec-http2;4.1.86.Final in central
    found io.netty#netty-common;4.1.86.Final in central
    found io.netty#netty-buffer;4.1.86.Final in central
    found io.netty#netty-transport;4.1.86.Final in central
    found io.netty#netty-resolver;4.1.86.Final in central
    found io.netty#netty-codec;4.1.86.Final in central
    found io.netty#netty-transport-classes-epoll;4.1.86.Final in central
    found io.netty#netty-transport-native-unix-common;4.1.86.Final in central
    found com.amazonaws#aws-java-sdk-dynamodb;1.12.735 in central
    found com.amazonaws#aws-java-sdk-s3;1.12.735 in central
    found com.amazonaws#aws-java-sdk-kms;1.12.735 in central
    found com.amazonaws#aws-java-sdk-core;1.12.735 in central
    found commons-codec#commons-codec;1.15 in local-m2-cache
    found com.fasterxml.jackson.core#jackson-databind;2.12.7.2 in central
    found com.fasterxml.jackson.core#jackson-annotations;2.12.7 in local-m2-cache
    found com.fasterxml.jackson.core#jackson-core;2.12.7 in local-m2-cache
    found com.fasterxml.jackson.dataformat#jackson-dataformat-cbor;2.12.6 in central
    found joda-time#joda-time;2.12.7 in central
    found com.amazonaws#jmespath-java;1.12.735 in central
    found org.apache.hudi#hudi-aws-bundle;0.14.0 in central
    found org.apache.hudi#hudi-common;0.14.0 in central
    found org.openjdk.jol#jol-core;0.16 in local-m2-cache
    found com.fasterxml.jackson.datatype#jackson-datatype-jsr310;2.10.0 in local-m2-cache
    found com.github.ben-manes.caffeine#caffeine;2.9.1 in local-m2-cache
    found org.checkerframework#checker-qual;3.10.0 in local-m2-cache
    found com.google.errorprone#error_prone_annotations;2.5.1 in local-m2-cache
    found org.apache.orc#orc-core;1.6.0 in local-m2-cache
    found org.apache.orc#orc-shims;1.6.0 in local-m2-cache
    found org.slf4j#slf4j-api;1.7.36 in local-m2-cache
    found com.google.protobuf#protobuf-java;3.21.7 in local-m2-cache
    found commons-lang#commons-lang;2.6 in local-m2-cache
    found io.airlift#aircompressor;0.15 in local-m2-cache
    found javax.xml.bind#jaxb-api;2.2.11 in local-m2-cache
    found org.apache.hive#hive-storage-api;2.6.0 in local-m2-cache
    found org.jetbrains#annotations;17.0.0 in local-m2-cache
    found org.roaringbitmap#RoaringBitmap;0.9.47 in local-m2-cache
    found org.apache.httpcomponents#fluent-hc;4.4.1 in local-m2-cache
    found org.rocksdb#rocksdbjni;7.5.3 in local-m2-cache
    found org.apache.hbase#hbase-client;2.4.9 in local-m2-cache
    found org.apache.hbase.thirdparty#hbase-shaded-protobuf;3.5.1 in local-m2-cache
    found org.apache.hbase#hbase-protocol-shaded;2.4.9 in local-m2-cache
    found org.apache.yetus#audience-annotations;0.5.0 in local-m2-cache
    found org.apache.hbase#hbase-protocol;2.4.9 in local-m2-cache
    found javax.annotation#javax.annotation-api;1.2 in local-m2-cache
    found commons-io#commons-io;2.11.0 in local-m2-cache
    found org.apache.commons#commons-lang3;3.9 in local-m2-cache
    found org.apache.hbase.thirdparty#hbase-shaded-miscellaneous;3.5.1 in local-m2-cache
    found com.google.errorprone#error_prone_annotations;2.7.1 in local-m2-cache
    found org.apache.hbase.thirdparty#hbase-shaded-netty;3.5.1 in local-m2-cache
    found org.apache.zookeeper#zookeeper;3.5.7 in local-m2-cache
    found org.apache.zookeeper#zookeeper-jute;3.5.7 in local-m2-cache
    found io.netty#netty-handler;4.1.45.Final in local-m2-cache
    found io.netty#netty-transport-native-epoll;4.1.45.Final in local-m2-cache
    found org.apache.htrace#htrace-core4;4.2.0-incubating in local-m2-cache
    found org.jruby.jcodings#jcodings;1.0.55 in local-m2-cache
    found org.jruby.joni#joni;2.1.31 in local-m2-cache
    found io.dropwizard.metrics#metrics-core;4.1.1 in local-m2-cache
    found org.apache.commons#commons-crypto;1.0.0 in local-m2-cache
    found org.apache.hbase#hbase-server;2.4.9 in local-m2-cache
    found org.apache.hbase#hbase-procedure;2.4.9 in local-m2-cache
    found org.apache.hbase#hbase-replication;2.4.9 in local-m2-cache
    found org.glassfish.web#javax.servlet.jsp;2.3.2 in local-m2-cache
    found org.glassfish#javax.el;3.0.1-b12 in local-m2-cache
    found javax.servlet.jsp#javax.servlet.jsp-api;2.3.1 in local-m2-cache
    found org.apache.commons#commons-math3;3.6.1 in local-m2-cache
    found org.jamon#jamon-runtime;2.4.1 in local-m2-cache
    found com.lmax#disruptor;3.4.2 in local-m2-cache
    found org.lz4#lz4-java;1.8.0 in local-m2-cache
    found org.roaringbitmap#shims;0.9.47 in local-m2-cache
    found org.apache.hudi#hudi-hive-sync;0.14.0 in central
    found org.apache.hudi#hudi-hadoop-mr;0.14.0 in central
    found org.apache.hudi#hudi-sync-common;0.14.0 in central
    found com.beust#jcommander;1.78 in central
    found org.apache.hudi#hudi-aws;0.14.0 in central
    found software.amazon.awssdk#cloudwatch;2.18.40 in central
    found software.amazon.awssdk#aws-query-protocol;2.18.40 in central
    found software.amazon.awssdk#glue;2.18.40 in central
    found software.amazon.awssdk#sqs;2.18.40 in central
    found org.apache.parquet#parquet-avro;1.10.1 in local-m2-cache
    found org.apache.parquet#parquet-column;1.10.1 in local-m2-cache
    found org.apache.parquet#parquet-common;1.10.1 in local-m2-cache
    found org.apache.parquet#parquet-format;2.4.0 in local-m2-cache
    found org.apache.parquet#parquet-encoding;1.10.1 in local-m2-cache
    found org.apache.parquet#parquet-hadoop;1.10.1 in local-m2-cache
    found org.apache.parquet#parquet-jackson;1.10.1 in local-m2-cache
    found org.codehaus.jackson#jackson-mapper-asl;1.9.13 in local-m2-cache
    found org.codehaus.jackson#jackson-core-asl;1.9.13 in local-m2-cache
    found org.xerial.snappy#snappy-java;1.1.8.3 in local-m2-cache
    found commons-pool#commons-pool;1.6 in local-m2-cache
    found org.apache.avro#avro;1.8.2 in local-m2-cache
    found com.thoughtworks.paranamer#paranamer;2.7 in local-m2-cache
    found org.apache.commons#commons-compress;1.8.1 in local-m2-cache
    found org.tukaani#xz;1.5 in local-m2-cache
    found it.unimi.dsi#fastutil;7.0.13 in local-m2-cache
:: resolution report :: resolve 734ms :: artifacts dl 22ms
    :: modules in use:
    com.amazonaws#aws-java-sdk-core;1.12.735 from central in [default]
    com.amazonaws#aws-java-sdk-dynamodb;1.12.735 from central in [default]
    com.amazonaws#aws-java-sdk-kms;1.12.735 from central in [default]
    com.amazonaws#aws-java-sdk-s3;1.12.735 from central in [default]
    com.amazonaws#dynamodb-lock-client;1.2.0 from central in [default]
    com.amazonaws#jmespath-java;1.12.735 from central in [default]
    com.beust#jcommander;1.78 from central in [default]
    com.fasterxml.jackson.core#jackson-annotations;2.12.7 from local-m2-cache in [default]
    com.fasterxml.jackson.core#jackson-core;2.12.7 from local-m2-cache in [default]
    com.fasterxml.jackson.core#jackson-databind;2.12.7.2 from central in [default]
    com.fasterxml.jackson.dataformat#jackson-dataformat-cbor;2.12.6 from central in [default]
    com.fasterxml.jackson.datatype#jackson-datatype-jsr310;2.10.0 from local-m2-cache in [default]
    com.github.ben-manes.caffeine#caffeine;2.9.1 from local-m2-cache in [default]
    com.google.errorprone#error_prone_annotations;2.7.1 from local-m2-cache in [default]
    com.google.protobuf#protobuf-java;3.21.7 from local-m2-cache in [default]
    com.lmax#disruptor;3.4.2 from local-m2-cache in [default]
    com.thoughtworks.paranamer#paranamer;2.7 from local-m2-cache in [default]
    commons-codec#commons-codec;1.15 from local-m2-cache in [default]
    commons-io#commons-io;2.11.0 from local-m2-cache in [default]
    commons-lang#commons-lang;2.6 from local-m2-cache in [default]
    commons-logging#commons-logging;1.2 from local-m2-cache in [default]
    commons-pool#commons-pool;1.6 from local-m2-cache in [default]
    io.airlift#aircompressor;0.15 from local-m2-cache in [default]
    io.dropwizard.metrics#metrics-core;4.1.1 from local-m2-cache in [default]
    io.netty#netty-buffer;4.1.86.Final from central in [default]
    io.netty#netty-codec;4.1.86.Final from central in [default]
    io.netty#netty-codec-http2;4.1.86.Final from central in [default]
    io.netty#netty-common;4.1.86.Final from central in [default]
    io.netty#netty-handler;4.1.45.Final from local-m2-cache in [default]
    io.netty#netty-resolver;4.1.86.Final from central in [default]
    io.netty#netty-transport;4.1.86.Final from central in [default]
    io.netty#netty-transport-classes-epoll;4.1.86.Final from central in [default]
    io.netty#netty-transport-native-epoll;4.1.45.Final from local-m2-cache in [default]
    io.netty#netty-transport-native-unix-common;4.1.86.Final from central in [default]
    it.unimi.dsi#fastutil;7.0.13 from local-m2-cache in [default]
    javax.annotation#javax.annotation-api;1.2 from local-m2-cache in [default]
    javax.servlet.jsp#javax.servlet.jsp-api;2.3.1 from local-m2-cache in [default]
    javax.xml.bind#jaxb-api;2.2.11 from local-m2-cache in [default]
    joda-time#joda-time;2.12.7 from central in [default]
    org.apache.avro#avro;1.8.2 from local-m2-cache in [default]
    org.apache.commons#commons-compress;1.8.1 from local-m2-cache in [default]
    org.apache.commons#commons-crypto;1.0.0 from local-m2-cache in [default]
    org.apache.commons#commons-lang3;3.9 from local-m2-cache in [default]
    org.apache.commons#commons-math3;3.6.1 from local-m2-cache in [default]
    org.apache.hbase#hbase-client;2.4.9 from local-m2-cache in [default]
    org.apache.hbase#hbase-procedure;2.4.9 from local-m2-cache in [default]
    org.apache.hbase#hbase-protocol;2.4.9 from local-m2-cache in [default]
    org.apache.hbase#hbase-protocol-shaded;2.4.9 from local-m2-cache in [default]
    org.apache.hbase#hbase-replication;2.4.9 from local-m2-cache in [default]
    org.apache.hbase#hbase-server;2.4.9 from local-m2-cache in [default]
    org.apache.hbase.thirdparty#hbase-shaded-miscellaneous;3.5.1 from local-m2-cache in [default]
    org.apache.hbase.thirdparty#hbase-shaded-netty;3.5.1 from local-m2-cache in [default]
    org.apache.hbase.thirdparty#hbase-shaded-protobuf;3.5.1 from local-m2-cache in [default]
    org.apache.hive#hive-storage-api;2.6.0 from local-m2-cache in [default]
    org.apache.htrace#htrace-core4;4.2.0-incubating from local-m2-cache in [default]
    org.apache.httpcomponents#fluent-hc;4.4.1 from local-m2-cache in [default]
    org.apache.httpcomponents#httpclient;4.5.13 from local-m2-cache in [default]
    org.apache.httpcomponents#httpcore;4.4.13 from local-m2-cache in [default]
    org.apache.hudi#hudi-aws;0.14.0 from central in [default]
    org.apache.hudi#hudi-aws-bundle;0.14.0 from central in [default]
    org.apache.hudi#hudi-common;0.14.0 from central in [default]
    org.apache.hudi#hudi-hadoop-mr;0.14.0 from central in [default]
    org.apache.hudi#hudi-hive-sync;0.14.0 from central in [default]
    org.apache.hudi#hudi-spark3.4-bundle_2.12;0.14.0 from spark-list in [default]
    org.apache.hudi#hudi-sync-common;0.14.0 from central in [default]
    org.apache.orc#orc-core;1.6.0 from local-m2-cache in [default]
    org.apache.orc#orc-shims;1.6.0 from local-m2-cache in [default]
    org.apache.parquet#parquet-avro;1.10.1 from local-m2-cache in [default]
    org.apache.parquet#parquet-column;1.10.1 from local-m2-cache in [default]
    org.apache.parquet#parquet-common;1.10.1 from local-m2-cache in [default]
    org.apache.parquet#parquet-encoding;1.10.1 from local-m2-cache in [default]
    org.apache.parquet#parquet-format;2.4.0 from local-m2-cache in [default]
    org.apache.parquet#parquet-hadoop;1.10.1 from local-m2-cache in [default]
    org.apache.parquet#parquet-jackson;1.10.1 from local-m2-cache in [default]
    org.apache.yetus#audience-annotations;0.5.0 from local-m2-cache in [default]
    org.apache.zookeeper#zookeeper;3.5.7 from local-m2-cache in [default]
    org.apache.zookeeper#zookeeper-jute;3.5.7 from local-m2-cache in [default]
    org.checkerframework#checker-qual;3.10.0 from local-m2-cache in [default]
    org.codehaus.jackson#jackson-core-asl;1.9.13 from local-m2-cache in [default]
    org.codehaus.jackson#jackson-mapper-asl;1.9.13 from local-m2-cache in [default]
    org.glassfish#javax.el;3.0.1-b12 from local-m2-cache in [default]
    org.glassfish.web#javax.servlet.jsp;2.3.2 from local-m2-cache in [default]
    org.jamon#jamon-runtime;2.4.1 from local-m2-cache in [default]
    org.jetbrains#annotations;17.0.0 from local-m2-cache in [default]
    org.jruby.jcodings#jcodings;1.0.55 from local-m2-cache in [default]
    org.jruby.joni#joni;2.1.31 from local-m2-cache in [default]
    org.lz4#lz4-java;1.8.0 from local-m2-cache in [default]
    org.openjdk.jol#jol-core;0.16 from local-m2-cache in [default]
    org.reactivestreams#reactive-streams;1.0.2 from central in [default]
    org.roaringbitmap#RoaringBitmap;0.9.47 from local-m2-cache in [default]
    org.roaringbitmap#shims;0.9.47 from local-m2-cache in [default]
    org.rocksdb#rocksdbjni;7.5.3 from local-m2-cache in [default]
    org.slf4j#slf4j-api;1.7.36 from local-m2-cache in [default]
    org.tukaani#xz;1.5 from local-m2-cache in [default]
    org.xerial.snappy#snappy-java;1.1.8.3 from local-m2-cache in [default]
    software.amazon.awssdk#annotations;2.20.8 from central in [default]
    software.amazon.awssdk#apache-client;2.20.8 from central in [default]
    software.amazon.awssdk#auth;2.20.8 from central in [default]
    software.amazon.awssdk#aws-core;2.20.8 from central in [default]
    software.amazon.awssdk#aws-json-protocol;2.20.8 from central in [default]
    software.amazon.awssdk#aws-query-protocol;2.18.40 from central in [default]
    software.amazon.awssdk#cloudwatch;2.18.40 from central in [default]
    software.amazon.awssdk#dynamodb;2.20.8 from central in [default]
    software.amazon.awssdk#endpoints-spi;2.20.8 from central in [default]
    software.amazon.awssdk#glue;2.18.40 from central in [default]
    software.amazon.awssdk#http-client-spi;2.20.8 from central in [default]
    software.amazon.awssdk#json-utils;2.20.8 from central in [default]
    software.amazon.awssdk#metrics-spi;2.20.8 from central in [default]
    software.amazon.awssdk#netty-nio-client;2.20.8 from central in [default]
    software.amazon.awssdk#profiles;2.20.8 from central in [default]
    software.amazon.awssdk#protocol-core;2.20.8 from central in [default]
    software.amazon.awssdk#regions;2.20.8 from central in [default]
    software.amazon.awssdk#sdk-core;2.20.8 from central in [default]
    software.amazon.awssdk#sqs;2.18.40 from central in [default]
    software.amazon.awssdk#third-party-jackson-core;2.20.8 from central in [default]
    software.amazon.awssdk#utils;2.20.8 from central in [default]
    software.amazon.eventstream#eventstream;1.0.1 from central in [default]
    :: evicted modules:
    org.slf4j#slf4j-api;1.7.30 by [org.slf4j#slf4j-api;1.7.36] in [default]
    commons-logging#commons-logging;1.1.3 by [commons-logging#commons-logging;1.2] in [default]
    com.fasterxml.jackson.core#jackson-databind;2.12.6 by [com.fasterxml.jackson.core#jackson-databind;2.12.7.2] in [default]
    com.fasterxml.jackson.core#jackson-core;2.12.6 by [com.fasterxml.jackson.core#jackson-core;2.12.7] in [default]
    com.fasterxml.jackson.core#jackson-annotations;2.10.0 by [com.fasterxml.jackson.core#jackson-annotations;2.12.7] in [default]
    com.fasterxml.jackson.core#jackson-databind;2.10.0 by [com.fasterxml.jackson.core#jackson-databind;2.12.7.2] in [default]
    com.fasterxml.jackson.core#jackson-core;2.10.0 by [com.fasterxml.jackson.core#jackson-core;2.12.7] in [default]
    com.google.errorprone#error_prone_annotations;2.5.1 by [com.google.errorprone#error_prone_annotations;2.7.1] in [default]
    org.apache.httpcomponents#httpclient;4.4.1 by [org.apache.httpcomponents#httpclient;4.5.13] in [default]
    commons-codec#commons-codec;1.13 by [commons-codec#commons-codec;1.10] in [default]
    io.netty#netty-common;4.1.45.Final by [io.netty#netty-common;4.1.86.Final] in [default]
    io.netty#netty-buffer;4.1.45.Final by [io.netty#netty-buffer;4.1.86.Final] in [default]
    io.netty#netty-transport;4.1.45.Final by [io.netty#netty-transport;4.1.86.Final] in [default]
    io.netty#netty-codec;4.1.45.Final by [io.netty#netty-codec;4.1.86.Final] in [default]
    io.netty#netty-transport-native-unix-common;4.1.45.Final by [io.netty#netty-transport-native-unix-common;4.1.86.Final] in [default]
    software.amazon.awssdk#protocol-core;2.18.40 by [software.amazon.awssdk#protocol-core;2.20.8] in [default]
    software.amazon.awssdk#aws-core;2.18.40 by [software.amazon.awssdk#aws-core;2.20.8] in [default]
    software.amazon.awssdk#sdk-core;2.18.40 by [software.amazon.awssdk#sdk-core;2.20.8] in [default]
    software.amazon.awssdk#annotations;2.18.40 by [software.amazon.awssdk#annotations;2.20.8] in [default]
    software.amazon.awssdk#http-client-spi;2.18.40 by [software.amazon.awssdk#http-client-spi;2.20.8] in [default]
    software.amazon.awssdk#utils;2.18.40 by [software.amazon.awssdk#utils;2.20.8] in [default]
    software.amazon.awssdk#auth;2.18.40 by [software.amazon.awssdk#auth;2.20.8] in [default]
    software.amazon.awssdk#regions;2.18.40 by [software.amazon.awssdk#regions;2.20.8] in [default]
    software.amazon.awssdk#metrics-spi;2.18.40 by [software.amazon.awssdk#metrics-spi;2.20.8] in [default]
    software.amazon.awssdk#json-utils;2.18.40 by [software.amazon.awssdk#json-utils;2.20.8] in [default]
    software.amazon.awssdk#endpoints-spi;2.18.40 by [software.amazon.awssdk#endpoints-spi;2.20.8] in [default]
    software.amazon.awssdk#dynamodb;2.18.40 by [software.amazon.awssdk#dynamodb;2.20.8] in [default]
    software.amazon.awssdk#aws-json-protocol;2.18.40 by [software.amazon.awssdk#aws-json-protocol;2.20.8] in [default]
    org.apache.httpcomponents#httpcore;4.4.1 by [org.apache.httpcomponents#httpcore;4.4.13] in [default]
    software.amazon.awssdk#apache-client;2.18.40 by [software.amazon.awssdk#apache-client;2.20.8] in [default]
    software.amazon.awssdk#netty-nio-client;2.18.40 by [software.amazon.awssdk#netty-nio-client;2.20.8] in [default]
    commons-codec#commons-codec;1.10 by [commons-codec#commons-codec;1.15] in [default]
    ---------------------------------------------------------------------
    |                  |            modules            ||   artifacts   |
    |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
    ---------------------------------------------------------------------
    |      default     |  149  |   0   |   0   |   32  ||  117  |   0   |
    ---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-aa8d9c29-7056-4201-b20a-c5f73fac7ea9
    confs: [default]
    0 artifacts copied, 117 already retrieved (0kB/10ms)
24/06/05 09:14:05 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
+--------+----------+----------+----------+-----------+-----+                   
| orderID|productSKU|customerID| orderDate|orderAmount|state|
+--------+----------+----------+----------+-----------+-----+
|  order1|   prod001|   cust001|2024-01-15|      150.0|   CA|
|order002|   prod002|   cust002|2024-01-16|      200.0|   NY|
|order003|   prod003|   cust003|2024-01-17|      300.0|   TX|
|order004|   prod004|   cust004|2024-01-18|      250.0|   FL|
|order005|   prod005|   cust005|2024-01-19|      100.0|   WA|
|order006|   prod006|   cust006|2024-01-20|      350.0|   CA|
|order007|   prod007|   cust007|2024-01-21|      400.0|   NY|
+--------+----------+----------+----------+-----------+-----+

{'hoodie.table.name': 'orders', 'hoodie.datasource.write.table.type': 'COPY_ON_WRITE', 'hoodie.datasource.write.table.name': 'orders', 'hoodie.datasource.write.operation': 'upsert', 'hoodie.datasource.write.recordkey.field': 'orderID', 'hoodie.datasource.write.precombine.field': 'orderDate', 'hoodie.datasource.write.partitionpath.field': 'state', 'hoodie.write.concurrency.mode': 'optimistic_concurrency_control', 'hoodie.cleaner.policy.failed.writes': 'LAZY', 'hoodie.write.lock.provider': 'org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider', 'hoodie.write.lock.dynamodb.table': 'hudi-lock-table', 'hoodie.write.lock.dynamodb.region': 'us-east-1', 'hoodie.write.lock.dynamodb.endpoint_url': 'dynamodb.us-east-1.amazonaws.com', 'hoodie.write.lock.dynamodb.billing_mode': 'PAY_PER_REQUEST'}

 file:///Users/soumilshah/IdeaProjects/SparkProject/tem/database=default/table_name=orders 

24/06/05 09:14:12 WARN DFSPropertiesConfiguration: Cannot find HUDI_CONF_DIR, please set it as the dir of hudi-defaults.conf
24/06/05 09:14:12 WARN DFSPropertiesConfiguration: Properties file file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file
Traceback (most recent call last):
  File "/Users/soumilshah/IdeaProjects/SparkProject/deltastreamerBroadcastJoins/conflictdetection/w1.py", line 90, in <module>
    write_to_hudi(
  File "/Users/soumilshah/IdeaProjects/SparkProject/deltastreamerBroadcastJoins/conflictdetection/w1.py", line 88, in write_to_hudi
    .save(path)
     ^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/pyspark/sql/readwriter.py", line 1398, in save
    self._jwrite.save(path)
  File "/opt/anaconda3/lib/python3.11/site-packages/py4j/java_gateway.py", line 1322, in __call__
    return_value = get_return_value(
                   ^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/pyspark/errors/exceptions/captured.py", line 169, in deco
    return f(*a, **kw)
           ^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/py4j/protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o65.save.
: org.apache.hudi.exception.HoodieException: Unable to instantiate class org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider
    at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:81)
    at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:129)
    at org.apache.hudi.client.transaction.lock.LockManager.getLockProvider(LockManager.java:118)
    at org.apache.hudi.client.transaction.lock.LockManager.lock(LockManager.java:71)
    at org.apache.hudi.client.transaction.TransactionManager.beginTransaction(TransactionManager.java:58)
    at org.apache.hudi.client.BaseHoodieWriteClient.doInitTable(BaseHoodieWriteClient.java:1253)
    at org.apache.hudi.client.BaseHoodieWriteClient.initTable(BaseHoodieWriteClient.java:1296)
    at org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:139)
    at org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:224)
    at org.apache.hudi.HoodieSparkSqlWriter$.writeInternal(HoodieSparkSqlWriter.scala:431)
    at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:132)
    at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:150)
    at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:47)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
    at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:118)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
    at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
    at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94)
    at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
    at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:31)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488)
    at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:94)
    at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:81)
    at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:79)
    at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:133)
    at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:856)
    at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:387)
    at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:360)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
    at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.reflect.InvocationTargetException
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
    at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:79)
    ... 52 more
Caused by: java.lang.NoSuchMethodError: 'com.amazonaws.services.dynamodbv2.AmazonDynamoDBLockClientOptions$AmazonDynamoDBLockClientOptionsBuilder com.amazonaws.services.dynamodbv2.AmazonDynamoDBLockClientOptions.builder(org.apache.hudi.software.amazon.awssdk.services.dynamodb.DynamoDbClient, java.lang.String)'
    at org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.<init>(DynamoDBBasedLockProvider.java:90)
    at org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.<init>(DynamoDBBasedLockProvider.java:75)
    ... 57 more

(base) soumilshah@Soumils-MBP conflictdetection % 
ad1happy2go commented 1 month ago

@soumilshah1995 Looks like AWS SDK bundle version conflicts with hudi-aws-bundle.

soumilshah1995 commented 1 month ago

yes I know its some jar issue as usual let me try older version to see if that works I have to just brute force here lol trying different versions

soumilshah1995 commented 3 weeks ago

I guess I can close it its mostly on version issue I didn't had a chance yet to try brute force but I know issue is in jar I suppose I can close this