apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.24k stars 2.39k forks source link

[SUPPORT] Hive SYNC TOOL on EMR failed, Exception in thread main java.ang.NoClassDefFoundError: com/fasterxml/... #10741

Closed huliwuli closed 4 months ago

huliwuli commented 5 months ago

Tips before filing an issue

Describe the problem you faced

Did Async Clustering on EMR 6.14 and Hive on Athena did not sync the latest commit after clustering? I want to use the hive sync tool to sync it.

When using

cd /usr/lib/hudi/bin

./run_sync_tool.sh --base-path s3://<bucket_name>/<prefix>/<table_name> --database <database_name> --table <table_name> --partitioned-by <column_name>

I got the error caused by java.lang.ClassNotFoundException: com.fasterxml.jackson,datatype.jsr310.JavaTimeModule.

Also, I noticed AWS documentation includes use-jdbc false image

so I did

cd /usr/lib/hudi/bin

./run_sync_tool.sh --base-path s3://<bucket_name>/<prefix>/<table_name> --database <database_name> --table <table_name> --partitioned-by <column_name> --sync-mode hms --use-jdbc false --sync-tool-classes org.apache.hudi.hive.MultiPartKeysValueExtractor

Then I got: 'false' but no main parameter was defined in your arg class

Environment Description

Hudi version : 0.13.0

Spark version : 3.4.1

Hive version : 0.13.1

Hadoop version :

Storage (HDFS/S3/GCS..) : S3

Running on Docker? (yes/no) : NO

danny0405 commented 5 months ago

Looks like a jackson jar conflict.

huliwuli commented 4 months ago

Looks like a jackson jar conflict.

Is there anything I can do for this issue?

danny0405 commented 4 months ago

Finds out where the legacy jackson comes from and remove it from the classpath.

huliwuli commented 4 months ago

Finds out where the legacy jackson comes from and remove it from the classpath.

Ok, thanks I will try ... since it's on EMR. Not sure whether I have permission to remove it. Or do you know which EMR/Hudi version is suitable to solve this issue?

ad1happy2go commented 4 months ago

@huliwuli Did you tried using 0.14.1 ? 0.13.0 was not even supported with spark 3.4

ad1happy2go commented 4 months ago

Sorry, Looks like you are using the AWS managed hudi. Can you try using emr-6.15.0 which has hudi 0.14.0

huliwuli commented 4 months ago

Sorry, Looks like you are using the AWS managed hudi. Can you try using emr-6.15.0 which has hudi 0.14.0

EMR 6.15 worked, I tested it yesterday.

ad1happy2go commented 4 months ago

Great! Thanks lot @huliwuli. Closing out this issue then. Please reopen in case you have any concerns.