awslabs / emr-dynamodb-connector

Implementations of open source Apache Hadoop/Hive interfaces which allow for ingesting data from Amazon DynamoDB
Apache License 2.0
217 stars 135 forks source link

Set default credential provider to sdk's default provider #203

Closed smadurawe-oss closed 2 months ago

smadurawe-oss commented 2 months ago

The default credential provider dynamodb connector falls back to is InstanceProfileCredentialProvider. However, the instance profile is not available in all deployment environments (e.g. EMR Serverless). Instead, aws-java-sdk-v2's default provider should be used as the fallback to handle all environments. This keeps existing behavior as InstanceProfile provider is last provider in the default chain but also contains required providers for other deployments.

Issue #, if available: N/A

Description of changes: See above

Testing mvn clean install:

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for EMRDynamoDBConnector 5.5.0-SNAPSHOT:
[INFO]
[INFO] EMRDynamoDBConnector ............................... SUCCESS [  0.309 s]
[INFO] EMRDynamoDBHadoop .................................. SUCCESS [01:02 min]
[INFO] EMRDynamoDBConnectorShims .......................... SUCCESS [  0.003 s]
[INFO] ShimsCommon ........................................ SUCCESS [  0.374 s]
[INFO] Hive2Shims ......................................... SUCCESS [  0.251 s]
[INFO] Hive3Shims ......................................... SUCCESS [  0.123 s]
[INFO] ShimsLoader ........................................ SUCCESS [  0.129 s]
[INFO] EMRDynamoDBHive .................................... SUCCESS [  2.152 s]
[INFO] EMRDynamoDBTools ................................... SUCCESS [  1.081 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  01:06 min
[INFO] Finished at: 2024-08-08T13:04:40-07:00
[INFO] ------------------------------------------------------------------------

Testing was done on an EMR cluster with hive queries connecting to dynamo db SQL executed:

hive> CREATE EXTERNAL TABLE ddb_features
    >     (feature_id   BIGINT,
    >     feature_name  STRING,
    >     feature_class STRING,
    >     state_alpha   STRING,
    >     prim_lat_dec  DOUBLE,
    >     prim_long_dec DOUBLE,
    >     elev_in_ft    BIGINT)
    > STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
    > TBLPROPERTIES(
    >     "dynamodb.table.name" = "msugath-features",
    >     "dynamodb.column.mapping"="feature_id:Id,feature_name:Name,feature_class:Class,state_alpha:State,prim_lat_dec:Latitude,prim_long_dec:Longitude,elev_in_ft:Elevation"
    > );

Before change logs:

2024-08-08T19:41:12,146 DEBUG [dfe1dda6-b6f9-4d10-b120-15ad365cce63 main([])]: cache.CachedSupplier (:()
) - (InstanceProfileCredentialsProvider()) Cached value is stale and will be refreshed.
2024-08-08T19:41:12,147 DEBUG [dfe1dda6-b6f9-4d10-b120-15ad365cce63 main([])]: cache.CachedSupplier (:()
) - (InstanceProfileCredentialsProvider()) Refreshing cached value.
2024-08-08T19:41:12,167 DEBUG [dfe1dda6-b6f9-4d10-b120-15ad365cce63 main([])]: credentials.InstanceProfi
leCredentialsProvider (:()) - Loaded credentials from IMDS with expiration time of 2024-08-09T02:04:52Z
2024-08-08T19:41:12,175 DEBUG [dfe1dda6-b6f9-4d10-b120-15ad365cce63 main([])]: cache.CachedSupplier (:()
) - (InstanceProfileCredentialsProvider()) Successfully refreshed cached value. Next Prefetch Time: 2024
-08-09T01:47:15.340955127Z. Next Stale Time: 2024-08-09T02:04:51Z
2024-08-08T19:41:12,176 DEBUG [dfe1dda6-b6f9-4d10-b120-15ad365cce63 main([])]: credentials.AwsCredential
sProviderChain (:()) - Loading credentials from InstanceProfileCredentialsProvider()
2024-08-08T19:41:12,262 DEBUG [dfe1dda6-b6f9-4d10-b120-15ad365cce63 main([])]: interceptor.ExecutionInte
rceptorChain (:()) - Interceptor 'org.apache.hadoop.emr.ddb.shaded.software.amazon.awssdk.services.dynam
odb.endpoints.internal.DynamoDbRequestSetEndpointInterceptor@58647985' modified the message with its mod
ifyHttpRequest method.

After change logs:

2024-08-08T19:54:12,227 DEBUG [683d5610-7971-46c3-831f-d2fe299f1c45 main([])]: dynamodb.DynamoDBClient (
:()) - Custom credential provider not found, loading default provider from sdk
2024-08-08T19:54:12,230 DEBUG [683d5610-7971-46c3-831f-d2fe299f1c45 main([])]: providers.EndpointDiscove
ryProviderChain (:()) - Unable to load endpoint discovery from SystemPropertiesEndpointDiscoveryProvider
():No endpoint discovery setting set.
2024-08-08T19:54:12,230 DEBUG [683d5610-7971-46c3-831f-d2fe299f1c45 main([])]: providers.EndpointDiscove
ryProviderChain (:()) - Unable to load endpoint discovery from ProfileEndpointDiscoveryProvider():No end
point discovery setting provided in profile: default
2024-08-08T19:54:12,232 DEBUG [683d5610-7971-46c3-831f-d2fe299f1c45 main([])]: interceptor.ExecutionInte
rceptorChain (:()) - Creating an interceptor chain that will apply interceptors in the following order:
[org.apache.hadoop.emr.ddb.shaded.software.amazon.awssdk.core.internal.interceptor.HttpChecksumValidatio
nInterceptor@363a09a2, org.apache.hadoop.emr.ddb.shaded.software.amazon.awssdk.awscore.interceptor.Helpf
ulUnknownHostExceptionInterceptor@63d14dbf, org.apache.hadoop.emr.ddb.shaded.software.amazon.awssdk.awsc
ore.eventstream.EventStreamInitialRequestInterceptor@67521a79, org.apache.hadoop.emr.ddb.shaded.software
.amazon.awssdk.awscore.interceptor.TraceIdExecutionInterceptor@73839f22, org.apache.hadoop.emr.ddb.shade
d.software.amazon.awssdk.services.dynamodb.endpoints.internal.DynamoDbResolveEndpointInterceptor@1512efe
9, org.apache.hadoop.emr.ddb.shaded.software.amazon.awssdk.services.dynamodb.endpoints.internal.DynamoDb
RequestSetEndpointInterceptor@7cc7e441]
2024-08-08T19:54:12,233 DEBUG [683d5610-7971-46c3-831f-d2fe299f1c45 main([])]: credentials.AwsCredential
sProviderChain (:()) - Loading credentials from DefaultCredentialsProvider(providerChain=LazyAwsCredenti
alsProvider(delegate=Lazy(value=AwsCredentialsProviderChain(credentialsProviders=[SystemPropertyCredenti
alsProvider(), EnvironmentVariableCredentialsProvider(), WebIdentityTokenCredentialsProvider(), ProfileC
redentialsProvider(profileName=default, profileFile=ProfileFile(sections=[profiles, sso-session], profil
es=[Profile(name=default, properties=[s3])])), ContainerCredentialsProvider(), InstanceProfileCredential
sProvider()]))))

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.