aces / Loris-MRI

The set of scripts that preprocess and insert MRI data into the database.
10 stars 50 forks source link

Add method to check and skip duplicate content uploads to S3 #1032

Closed cmadjar closed 7 months ago

cmadjar commented 7 months ago

Rebase of https://github.com/aces/Loris-MRI/pull/1015 to 24.1-release so HBCD can benefit from those changes when we do the next bug fix release of LORIS-MRI.

Description (from PR #1015)

The changes here are intended to check to see if the content of file that would be uploaded to S3 has already been uploaded. It does this by checking to see if the hash of a file content is already available at the targeted S3 object key location before attempting to upload new content. If it already exists, it will skip it.

This helps to resolve an issue where sometimes the same content would be uploaded to an S3 bucket, even if that file already existed. Normally this would be fine, but in versioning enabled buckets this creates duplicate copies of the files when no changes are needed.

This does not appear to cause any breaking changes.

cmadjar commented 7 months ago

@breen0074 Thank you for submitting this fix! I rebased and tested the change. Works wonderfully :). Sorry for the delay in getting to it.