awslabs / emr-dynamodb-connector

Implementations of open source Apache Hadoop/Hive interfaces which allow for ingesting data from Amazon DynamoDB
Apache License 2.0
216 stars 135 forks source link

Use of temporary credentials via env variables #185

Open borgoat opened 1 year ago

borgoat commented 1 year ago

Hi!

I'm trying to use this as part of a Spark/Glue job (using the DynamoDB connector as a Glue DataSource^1), and while developing locally I'd like to use the environment variables to authenticate. I am using IAM Identity Center (formerly, AWS SSO), so I'm trying to set the usual AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN env vars.

However, DynamoDBClient misinterprets this, and forces the SDK client to be instantiated with BasicAWSCredentials (which uses the access key and secret access key only).

This method getAwsCredentialsProvider^2 appears to be the culprit in misconfiguring the DynamoDB client.

I was able to force the right provider with this configuration, but obviously, I'd rather avoid hard-coding it in my job, especially as this is likely only needed for local development...

glueContext
      .getSourceWithFormat(
        connectionType = "dynamodb",
        options = JsonOptions(
          Map(
            "dynamodb.input.tableName" -> "[redacted]",
            "dynamodb.regionid" -> "eu-west-1",
            "dynamodb.customAWSCredentialsProvider" -> "com.amazonaws.auth.EnvironmentVariableCredentialsProvider"
          )
        )
      )

Anyway, I find this behaviour surprising. I'm probably missing the larger context, but I have to say, in my experience, I'm yet to find a case where setting up an AWS SDK client with explicit credentials is the way to go... Usually, the SDK implicit, env-based configuration, works out-of-the-box in just about any deployment and development scenario, being predictable and consistent across different environments and languages.

But I reckon this has to do with some idiosyncrasies of Hadoop and/or Spark? Or am I doing something wrong? How do others handle similar scenarios?