Currently we do not configure DynamoDBConstants.READ_THROUGHPUT and DynamoDBConstants.WRITE_THROUGHPUT for PROVISIONED tables in our DynamoDB InputFormat. The reason for this is because PROVISIONED tables can potentially have auto-scaling enabled on Read and Write capacity and every time a new task starts we want to fetch from DDB the current capacity to make sure we are fully utilizing it. However we also use read and write throughput variables to calculate number of mappers so not setting this property initially will cause only one map task to be launched
These changes make it so that even for provisioned tables we initially set the read throughput and write throughput so that we can still estimate number of mappers that will be needed. However instead we pass a new configuration for PROVISIONED tables that indicated that throughput should be fetched every time a new task starts to account for auto-scaling.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Issue #, if available:
158
Description of changes:
Currently we do not configure DynamoDBConstants.READ_THROUGHPUT and DynamoDBConstants.WRITE_THROUGHPUT for PROVISIONED tables in our DynamoDB InputFormat. The reason for this is because PROVISIONED tables can potentially have auto-scaling enabled on Read and Write capacity and every time a new task starts we want to fetch from DDB the current capacity to make sure we are fully utilizing it. However we also use read and write throughput variables to calculate number of mappers so not setting this property initially will cause only one map task to be launched
These changes make it so that even for provisioned tables we initially set the read throughput and write throughput so that we can still estimate number of mappers that will be needed. However instead we pass a new configuration for PROVISIONED tables that indicated that throughput should be fetched every time a new task starts to account for auto-scaling.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.