Import from S3 to dynamoDB using pre-existing table

agostain commented 1 year ago

Describe the feature

Currently bulk import from S3 bucket to dynamoDB table only supports importing to a new DDB table created by the import_table API.

import_table should allow to provide a pre-existing DDB table instead of creating a new one at each import.

Use Case

Import multiple files from s3 to ddb table in different point in time. e.g. parallel execution of file processing and once it's done trigger the import. Another use case is to provision the DDB table with IaC (cdk, cfn , other) and handle permissions following the least privilege principle.

Proposed Solution

No response

Other Information

No response

Acknowledgements

[ ] I may be able to implement this feature request
[ ] This feature might incur a breaking change

SDK version used

1.26.36

Environment details (OS name and version, etc.)

macos

tim-finnigan commented 1 year ago

Linking my earlier comment on the older issue you opened: https://github.com/boto/boto3/issues/3540#issuecomment-1366185177

I'll reach out to the DynamoDB team and see if they are considering this feature.

frediy commented 1 year ago

The s3 to dynamo importer is extremely useful for moving data from analytical to operational systems, and this would significantly increase its ability to do so. Our current workaround is to provide a cache with the current prod table name which the operational system reads and delete the old table after ingestion and updating the table name cache. Being forced to maintain a table name cache adds some complexity around productionising workloads from the s3 importer that seems avoidable if this feature can be implemented.

In most of our use case, we'd want to replace the content of the existing table completely and just keep the table name as a reference to the newly ingested data so we can remove the table name caching service.

There might be use cases for s3 to dynamo imports into existing tables with data too, overwriting existing unique keys and keeping old records. This would be quite useful too, especially for particularly large datasets and seems worth raising as a possible variation of this feature request. If this is even feasible from a technical perspective.

tim-finnigan commented 1 year ago

Thanks for following up. We heard back from the DynamoDB team and found they were already tracking this feature request. Please feel free to check back in the future for updates — we recommend reaching out through AWS Support if you have a support plan.

github-actions[bot] commented 1 year ago

This issue is now closed.

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.

aws / aws-sdk