awslabs / emr-dynamodb-connector

Implementations of open source Apache Hadoop/Hive interfaces which allow for ingesting data from Amazon DynamoDB
Apache License 2.0
216 stars 135 forks source link

Backport DDB connector Credential Provider Fix for v5.4.0 #206

Closed nickherzig closed 3 weeks ago

nickherzig commented 3 weeks ago

Description of changes:

This is a backport/cherry pick of issues: https://github.com/awslabs/emr-dynamodb-connector/pull/201 https://github.com/awslabs/emr-dynamodb-connector/pull/203

View above for more information about the change

Testing mvn clean install

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for EMRDynamoDBConnector 5.4.0:
[INFO]
[INFO] EMRDynamoDBConnector ............................... SUCCESS [  0.313 s]
[INFO] EMRDynamoDBHadoop .................................. SUCCESS [01:08 min]
[INFO] EMRDynamoDBConnectorShims .......................... SUCCESS [  0.004 s]
[INFO] ShimsCommon ........................................ SUCCESS [  0.590 s]
[INFO] Hive2Shims ......................................... SUCCESS [  0.370 s]
[INFO] Hive3Shims ......................................... SUCCESS [  0.197 s]
[INFO] ShimsLoader ........................................ SUCCESS [  0.214 s]
[INFO] EMRDynamoDBHive .................................... SUCCESS [  3.076 s]
[INFO] EMRDynamoDBTools ................................... SUCCESS [  1.375 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  01:14 min
[INFO] Finished at: 2024-08-22T12:18:51-07:00
[INFO] ------------------------------------------------------------------------

manual testing: cluster-id: j-1H49IKM5CLDTF old-behavior by setting:

  <property>
          <name>dynamodb.customAWSCredentialsProvider</name>
          <value>org.apache.hadoop.emr.ddb.shaded.software.amazon.awssdk.auth.credentials.InstanceProfileCredentialsProvider</value>
  </property>

before change:

hive> CREATE EXTERNAL TABLE ddb_features_720
    >     (feature_id   BIGINT,
    >     feature_name  STRING,
    >     feature_class STRING,
    >     state_alpha   STRING,
    >     prim_lat_dec  DOUBLE,
    >     prim_long_dec DOUBLE,
    >     elev_in_ft    BIGINT)
    > STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
    > TBLPROPERTIES(
    >     "dynamodb.table.name" = "Features",
    >     "dynamodb.column.mapping"="feature_id:Id,feature_name:Name,feature_class:Class,state_alpha:State,prim_lat_dec:Latitude,prim_long_dec:Longitude,elev_in_ft:Elevation"
    > );
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.emr.ddb.shaded.software.amazon.awssdk.auth.credentials.InstanceProfileCredentialsProvider.<init>()

after change

hive> CREATE EXTERNAL TABLE ddb_features_720
    >     (feature_id   BIGINT,
    >     feature_name  STRING,
    >     feature_class STRING,
    >     state_alpha   STRING,
    >     prim_lat_dec  DOUBLE,
    >     prim_long_dec DOUBLE,
    >     elev_in_ft    BIGINT)
    > STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
    > TBLPROPERTIES(
    >     "dynamodb.table.name" = "Features",
    >     "dynamodb.column.mapping"="feature_id:Id,feature_name:Name,feature_class:Class,state_alpha:State,prim_lat_dec:Latitude,prim_long_dec:Longitude,elev_in_ft:Elevation"
    > );
WARNING: Configured write throughput of the dynamodb table Features is less than the cluster map capacity. ClusterMapCapacity: 10 WriteThroughput: 1
WARNING: Writes to this table might result in a write outage on the table.
OK
Time taken: 1.639 seconds
hive>
    > ;
hive> SELECT DISTINCT feature_class
    > FROM ddb_features_720
    > ORDER BY feature_class;
Query ID = hadoop_20240822202833_d377e380-e11b-4bf4-99b9-1964dbdebc08
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1724354185879_0002)

----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED      1          1        0        0       0       0
Reducer 2 ...... container     SUCCEEDED      2          2        0        0       0       0
Reducer 3 ...... container     SUCCEEDED      1          1        0        0       0       0
----------------------------------------------------------------------------------------------
VERTICES: 03/03  [==========================>>] 100%  ELAPSED TIME: 6.08 s
----------------------------------------------------------------------------------------------
OK
Arch
Bar
Basin
Bay
Beach
Bend
Cape
Cliff
Crossing
Falls
Flat
Forest
Gap
Glacier
Island
Lake
Lava
Levee
Range
Ridge
Slope
Spring
Stream
Summit
Swamp
Trail
Valley
Time taken: 8.394 seconds, Fetched: 27 row(s)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.