awslabs / emr-dynamodb-connector

Implementations of open source Apache Hadoop/Hive interfaces which allow for ingesting data from Amazon DynamoDB
Apache License 2.0
216 stars 135 forks source link

How to configure DynamoDBExport READ_THROUGHPUT For On-Demand Billing Mode #176

Open SomeKidXD opened 1 year ago

SomeKidXD commented 1 year ago

Hi there, my team is running into an issue with this library due to how READ_THROUGHPUT is set for On-Demand tables. The code defaults to the DynamoDB default of 40,000 for On-Demand tables: https://github.com/awslabs/emr-dynamodb-connector/blob/8d96fcd2c7b15aca34c6c3b61c809c877f1b04ce/emr-dynamodb-tools/src/main/java/org/apache/hadoop/dynamodb/tools/DynamoDBExport.java#L120-L124

However, we have increased the quota for our account with AWS to a value higher than this. Is there a way to get this READ_THROUGHPUT reflected in this library?

One possible way I found is if we set the readRatio to be higher than 1: https://github.com/awslabs/emr-dynamodb-connector/blob/8d96fcd2c7b15aca34c6c3b61c809c877f1b04ce/emr-dynamodb-tools/src/main/java/org/apache/hadoop/dynamodb/tools/DynamoDBExport.java#L141

I believe the library allows this, and I think the correct calculations are reflected: https://github.com/awslabs/emr-dynamodb-connector/blob/bbe15ffa5c4c985c363c139f1460c81066c4e927/emr-dynamodb-hadoop/src/main/java/org/apache/hadoop/dynamodb/read/AbstractDynamoDBInputFormat.java#L44-L65

My question is, is this an acceptable usage of the readRatio parameter? Or should a more sustainable way to override the default 40,000 READ_THROUGHPUT for On-Demand tables be built into this library?