awslabs / emr-dynamodb-connector

Implementations of open source Apache Hadoop/Hive interfaces which allow for ingesting data from Amazon DynamoDB
Apache License 2.0
217 stars 135 forks source link

Backport DDB connector Credential Provider Fix EMR 6.15.0 & EMR 7.0.0 #204

Closed nickherzig closed 2 months ago

nickherzig commented 2 months ago

Description of changes:

This is a backport/cherry pick of issues: https://github.com/awslabs/emr-dynamodb-connector/pull/201 https://github.com/awslabs/emr-dynamodb-connector/pull/203

View above for more information about the change

Testing mvn clean install

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for EMRDynamoDBConnector 5.2.0:
[INFO]
[INFO] EMRDynamoDBConnector ............................... SUCCESS [  0.479 s]
[INFO] EMRDynamoDBHadoop .................................. SUCCESS [01:09 min]
[INFO] EMRDynamoDBConnectorShims .......................... SUCCESS [  0.004 s]
[INFO] ShimsCommon ........................................ SUCCESS [  0.596 s]
[INFO] Hive2Shims ......................................... SUCCESS [  0.392 s]
[INFO] Hive3Shims ......................................... SUCCESS [  0.180 s]
[INFO] ShimsLoader ........................................ SUCCESS [  0.215 s]
[INFO] EMRDynamoDBHive .................................... SUCCESS [  5.900 s]
[INFO] EMRDynamoDBTools ................................... SUCCESS [  1.370 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  01:18 min
[INFO] Finished at: 2024-08-22T11:06:31-07:00
[INFO] ------------------------------------------------------------------------

manual testing: clusterid: j-1GTO05T0V6FVP old-behavior by setting:

  <property>
          <name>dynamodb.customAWSCredentialsProvider</name>
          <value>org.apache.hadoop.emr.ddb.shaded.software.amazon.awssdk.auth.credentials.InstanceProfileCredentialsProvider</value>
  </property>

before change:

hive> CREATE EXTERNAL TABLE ddb_features_615
    >     (feature_id   BIGINT,
    >     feature_name  STRING,
    >     feature_class STRING,
    >     state_alpha   STRING,
    >     prim_lat_dec  DOUBLE,
    >     prim_long_dec DOUBLE,
    >     elev_in_ft    BIGINT)
    > STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
    > TBLPROPERTIES(
    >     "dynamodb.table.name" = "Features",
    >     "dynamodb.column.mapping"="feature_id:Id,feature_name:Name,feature_class:Class,state_alpha:State,prim_lat_dec:Latitude,prim_long_dec:Longitude,elev_in_ft:Elevation"
    > );
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.emr.ddb.shaded.software.amazon.awssdk.auth.credentials.InstanceProfileCredentialsProvider.<init>()

after change

hive> CREATE EXTERNAL TABLE ddb_features_615
    >     (feature_id   BIGINT,
    >     feature_name  STRING,
    >     feature_class STRING,
    >     state_alpha   STRING,
    >     prim_lat_dec  DOUBLE,
    >     prim_long_dec DOUBLE,
    >     elev_in_ft    BIGINT)
    > STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
    > TBLPROPERTIES(
    >     "dynamodb.table.name" = "Features",
    >     "dynamodb.column.mapping"="feature_id:Id,feature_name:Name,feature_class:Class,state_alpha:State,prim_lat_dec:Latitude,prim_long_dec:Longitude,elev_in_ft:Elevation"
    > );
hive> SELECT DISTINCT feature_class
    > FROM ddb_features_615
    > ORDER BY feature_class;
Query ID = hadoop_20240822190423_c4fd60ad-2203-41a4-9c3d-d1f863999512
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1724350615438_0003)

----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED      1          1        0        0       0       0
Reducer 2 ...... container     SUCCEEDED      2          2        0        0       0       0
Reducer 3 ...... container     SUCCEEDED      1          1        0        0       0       0
----------------------------------------------------------------------------------------------
VERTICES: 03/03  [==========================>>] 100%  ELAPSED TIME: 7.16 s
----------------------------------------------------------------------------------------------
OK
Arch
Bar
Basin
Bay
Beach
Bend
Cape
Cliff
Crossing
Falls
Flat
Forest
Gap
Glacier
Island
Lake
Lava
Levee
Range
Ridge
Slope
Spring
Stream
Summit
Swamp
Trail
Valley
Time taken: 10.271 seconds, Fetched: 27 row(s)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.