adamchainz / dynamodb_utils

A toolchain for Amazon's DynamoDB to make common operations (backup, restore backups) easier.
ISC License
12 stars 1 forks source link

Backup to S3 #6

Open brandond opened 7 years ago

brandond commented 7 years ago

Would be nice to be able to have it stream the backup to a S3 bucket instead of local disk.

adamchainz commented 7 years ago

Sounds nice - but would that be robust enough? What if S3 errors?

brandond commented 7 years ago

Shouldn't be any worse than having the local disk throw an error while you're writing the dump locally. I believe there's some retry logic built into boto3 that could be taken advantage of. Possibly we could wrap BytesIO (with optional gzip around that) and then just call upload_fileobj() to get it into S3 without ever having to touch disk, although I'm not sure if that's better or worse for Lambda, which could potentially be more constrained on memory than disk.

adamchainz commented 7 years ago

Lambda only ever gets 500MiB temp disk space, but you can get several GiB of memory, so it's often more disk constrained

brandond commented 7 years ago

Ah indeed, 512mb of disk but up to 1.5GB of memory. In-memory is probably a decent approach then. I might end up modularizing it a bit so that I can farm the current Pool stuff out to parallel Lambda invocations rather than run them all out of one, which would add some additional efficiency for larger datasets.

adamchainz commented 7 years ago

I've realized the license is currently GPLv3, this would mean that if you imported it in your lambda you'd have to open source that lambda. I think we should change the license to the ISC License - since you're the only other author, I need your consent. Are you okay with the license change?

brandond commented 7 years ago

Absolutely!

adamchainz commented 7 years ago

Done in #13