apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.39k stars 2.42k forks source link

[SUPPORT] AmazonDynamoDBLockClientOptions failing to instantiate for Hudi AWS DynamoDB concurrency control #9018

Closed NewtonXu closed 1 year ago

NewtonXu commented 1 year ago

Describe the problem you faced

Trying to enable DynamoDB concurrency controls but unable to instantiate due to builder error.

Caused by: org.apache.hudi.exception.HoodieException: Unable to instantiate class org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider
    at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:81)
    at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:108)
    at org.apache.hudi.client.transaction.lock.LockManager.getLockProvider(LockManager.java:118)
    at org.apache.hudi.client.transaction.lock.LockManager.lock(LockManager.java:71)
    at org.apache.hudi.client.transaction.TransactionManager.beginTransaction(TransactionManager.java:58)
    at org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:226)
    at org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:209)
    at org.apache.hudi.internal.DataSourceInternalWriterHelper.commit(DataSourceInternalWriterHelper.java:84)
    ... 125 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:79)
    ... 132 more
Caused by: java.lang.NoSuchMethodError: com.amazonaws.services.dynamodbv2.AmazonDynamoDBLockClientOptions.builder(Lcom/amazonaws/services/dynamodbv2/AmazonDynamoDB;Ljava/lang/String;)Lcom/amazonaws/services/dynamodbv2/AmazonDynamoDBLockClientOptions$AmazonDynamoDBLockClientOptionsBuilder;
    at org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.<init>(DynamoDBBasedLockProvider.java:91)
    at org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.<init>(DynamoDBBasedLockProvider.java:77)
    ... 137 more

To Reproduce

Steps to reproduce the behavior:

Executing the Spark job with these packages:

--conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.hadoop.hive.metastore.client.factory.class=com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory --packages org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.1,com.amazonaws:dynamodb-lock-client:1.2.0,com.amazonaws:aws-java-sdk-dynamodb:1.12.490,com.amazonaws:aws-java-sdk-core:1.12.490,org.apache.hudi:hudi-aws:0.13.1 --conf spark.hadoop.hive.metastore.client.factory.class=com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory

Enabling concurrency control with these settings

'hoodie.write.lock.provider': "org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider",
'hoodie.write.lock.dynamodb.table': "LockTableName",
'hoodie.write.concurrency.mode': 'optimistic_concurrency_control',
'hoodie.cleaner.policy.failed.writes': "LAZY",
'hoodie.write.lock.dynamodb.region': 'us-east-1',
'hoodie.write.lock.dynamodb.partition_key': "TableName",

This job is created on AWS EMR Serverless

Expected behavior

DynamoDB lock is created

Environment Description

Additional context

I'm running this job on Amazon EMR serverless, but I've decided to use the open-source Hudi bundle instead. Previously when using the AWS bundle, it could not find DynamoDBBasedLockProvider.

Stacktrace

Add the stacktrace of the error.

Caused by: org.apache.hudi.exception.HoodieException: Unable to instantiate class org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider
    at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:81)
    at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:108)
    at org.apache.hudi.client.transaction.lock.LockManager.getLockProvider(LockManager.java:118)
    at org.apache.hudi.client.transaction.lock.LockManager.lock(LockManager.java:71)
    at org.apache.hudi.client.transaction.TransactionManager.beginTransaction(TransactionManager.java:58)
    at org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:226)
    at org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:209)
    at org.apache.hudi.internal.DataSourceInternalWriterHelper.commit(DataSourceInternalWriterHelper.java:84)
    ... 125 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:79)
    ... 132 more
Caused by: java.lang.NoSuchMethodError: com.amazonaws.services.dynamodbv2.AmazonDynamoDBLockClientOptions.builder(Lcom/amazonaws/services/dynamodbv2/AmazonDynamoDB;Ljava/lang/String;)Lcom/amazonaws/services/dynamodbv2/AmazonDynamoDBLockClientOptions$AmazonDynamoDBLockClientOptionsBuilder;
    at org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.<init>(DynamoDBBasedLockProvider.java:91)
    at org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.<init>(DynamoDBBasedLockProvider.java:77)
    ... 137 more

I am wondering if there is some incompatibility with the SDK's I've chosen? org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.1 com.amazonaws:dynamodb-lock-client:1.2.0 com.amazonaws:aws-java-sdk-dynamodb:1.12.490 com.amazonaws:aws-java-sdk-core:1.12.490 org.apache.hudi:hudi-aws:0.13.1

ad1happy2go commented 1 year ago

@NewtonXu Can you avoid adding aws jars. hudi-aws bundle contains the required aws jars also. These should be enough ideally - org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.1 org.apache.hudi:hudi-aws:0.13.1

ad1happy2go commented 1 year ago

@NewtonXu Were you able to get it working?

NewtonXu commented 1 year ago

Yes, this worked for me. Thanks!