awslabs / emr-dynamodb-connector

Implementations of open source Apache Hadoop/Hive interfaces which allow for ingesting data from Amazon DynamoDB
Apache License 2.0
216 stars 135 forks source link

Backport DDB connector Credential Provider Fix EMR 7.1.0 #205

Closed nickherzig closed 3 weeks ago

nickherzig commented 3 weeks ago

Description of changes:

This is a backport/cherry pick of issues: https://github.com/awslabs/emr-dynamodb-connector/pull/201 https://github.com/awslabs/emr-dynamodb-connector/pull/203

View above for more information about the change

Testing mvn clean install

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for EMRDynamoDBConnector 5.3.0:
[INFO]
[INFO] EMRDynamoDBConnector ............................... SUCCESS [  0.353 s]
[INFO] EMRDynamoDBHadoop .................................. SUCCESS [01:07 min]
[INFO] EMRDynamoDBConnectorShims .......................... SUCCESS [  0.006 s]
[INFO] ShimsCommon ........................................ SUCCESS [  0.554 s]
[INFO] Hive2Shims ......................................... SUCCESS [  0.358 s]
[INFO] Hive3Shims ......................................... SUCCESS [  0.188 s]
[INFO] ShimsLoader ........................................ SUCCESS [  0.228 s]
[INFO] EMRDynamoDBHive .................................... SUCCESS [  3.010 s]
[INFO] EMRDynamoDBTools ................................... SUCCESS [  1.347 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  01:13 min
[INFO] Finished at: 2024-08-22T12:13:24-07:00
[INFO] ------------------------------------------------------------------------

manual testing:

clusterID: j-3MERZ4FLPLJWJ old-behavior by setting:

  <property>
          <name>dynamodb.customAWSCredentialsProvider</name>
          <value>org.apache.hadoop.emr.ddb.shaded.software.amazon.awssdk.auth.credentials.InstanceProfileCredentialsProvider</value>
  </property>

before change:

hive> CREATE EXTERNAL TABLE ddb_features_710
    >     (feature_id   BIGINT,
    >     feature_name  STRING,
    >     feature_class STRING,
    >     state_alpha   STRING,
    >     prim_lat_dec  DOUBLE,
    >     prim_long_dec DOUBLE,
    >     elev_in_ft    BIGINT)
    > STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
    > TBLPROPERTIES(
    >     "dynamodb.table.name" = "Features",
    >     "dynamodb.column.mapping"="feature_id:Id,feature_name:Name,feature_class:Class,state_alpha:State,prim_lat_dec:Latitude,prim_long_dec:Longitude,elev_in_ft:Elevation"
    > );
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.emr.ddb.shaded.software.amazon.awssdk.auth.credentials.InstanceProfileCredentialsProvider.<init>()

after change

hive> CREATE EXTERNAL TABLE ddb_features_710
    >     (feature_id   BIGINT,
    >     feature_name  STRING,
    >     feature_class STRING,
    >     state_alpha   STRING,
    >     prim_lat_dec  DOUBLE,
    >     prim_long_dec DOUBLE,
    >     elev_in_ft    BIGINT)
    > STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
    > TBLPROPERTIES(
    >     "dynamodb.table.name" = "Features",
    >     "dynamodb.column.mapping"="feature_id:Id,feature_name:Name,feature_class:Class,state_alpha:State,prim_lat_dec:Latitude,prim_long_dec:Longitude,elev_in_ft:Elevation"
    > );
WARNING: Configured write throughput of the dynamodb table Features is less than the cluster map capacity. ClusterMapCapacity: 10 WriteThroughput: 1
WARNING: Writes to this table might result in a write outage on the table.
OK
Time taken: 1.82 seconds
hive> SELECT DISTINCT feature_class
    > FROM ddb_features_710
    > ORDER BY feature_class;
Query ID = hadoop_20240822201228_cd0e0a6c-933b-4b56-930f-d7f3732a91f9
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1724354102121_0001)

----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED      1          1        0        0       0       0
Reducer 2 ...... container     SUCCEEDED      2          2        0        0       0       0
Reducer 3 ...... container     SUCCEEDED      1          1        0        0       0       0
----------------------------------------------------------------------------------------------
VERTICES: 03/03  [==========================>>] 100%  ELAPSED TIME: 6.88 s
----------------------------------------------------------------------------------------------
OK
Arch
Bar
Basin
Bay
Beach
Bend
Cape
Cliff
Crossing
Falls
Flat
Forest
Gap
Glacier
Island
Lake
Lava
Levee
Range
Ridge
Slope
Spring
Stream
Summit
Swamp
Trail
Valley
Time taken: 9.096 seconds, Fetched: 27 row(s)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

smadurawe-oss commented 3 weeks ago

thanks for this.