Percona-Lab / mongodb_consistent_backup

A tool for performing consistent backups of MongoDB Clusters or Replica Sets
https://www.percona.com
Apache License 2.0
276 stars 81 forks source link

S3 Upload Fixes v1 #249

Closed timvaillancourt closed 6 years ago

timvaillancourt commented 6 years ago

This PR incorporates many fixes for the Upload/S3/* code. Somehow my squash commit of my work-branch didn't result in a single commit here, oh well..

Most of these fixes address #187, which actually raises several issues with the S3 code.

Here is what I found broken:

  1. PR #224 broke S3 upload unless you used that new 'upload.s3.bucket_explicit_key' feature.
  2. S3 upload only worked if archive mode 'tar' was used. Disabling archive mode broke S3 upload (and it's common to use no archiving and mongodump+gzip for performance).
  3. Non-multipart uploads were not supported (highlighted in #187). We found out in #187 that there is a 5mb minimum size limit on multipart uploads on AWS S3's end. This meant any backup file smaller than 5mb caused a stalled/failed upload.
  4. Exceptions in S3/S3UploadThread.py were not being caught by the parent/pool. This caused the parent to wait forever for dead child processes.
  5. Some connections and objects were not closed cleanly.
  6. "403 Unauthorized" errors from S3 were not logged in a useful way.
  7. Some blocks of code were duplicated from Upload/Gs/*.py.
  8. Other problems I am forgetting.

This PR adds:

  1. Fixes to make 'bucket_explicit_key' optional, as it was intended.
  2. Allow non-archive backups to upload. This required code to "walk" the backup path, etc.
  3. Allow non-mulitpart AND multipart updates. This required a hybrid upload pool to be created: S3/S3UploadPool.py; this control creating threads, handling success of multiparts, etc. Files less than the multipart min limit are uploaded without multiparts.
  4. Exceptions are properly raised from child (S3UploadThread.py) to parent (S3UploadPool.py).
  5. .close() methods now function properly, S3 Key objects are now closed.
  6. Unauthorized errors are logged in a helpful way.
  7. A progress bar for S3 Uploads was added.
  8. Add support for setting the 'release' version of RPM.
  9. Input sanitisation of variables and required variables was added.

Now exceptions are caught and any size of file is supported, with or without archiving enabled.

This PR resolves #187 (which I should have broken into 2-3 issues).