chriskuehl / lazy-build

Remotely cache build artifacts based on file hashes
MIT License
2 stars 1 forks source link

Have a story for corrupted multipart downloads from S3 #10

Open chriskuehl opened 7 years ago

chriskuehl commented 7 years ago

Ideally we would never replace artifacts and always use new names. One option could be to include a unique identifier and then list keys in the bucket instead, but the S3 consistency model doesn't guarantee that keys can be listed immediately after upload. (Maybe that's okay?)

chriskuehl commented 7 years ago

From talking to some people, list operations should really be avoided (they're expensive, slow, and have poor consistency guarantees).

A better alternative is to have a manifest file, e.g. for artifact with hash X, store a pointer to an uploaded artifact at X.manifest which contains the name of the artifact, which is of the form X.{uuid}. In theory the manifest might get overwritten due to races, but the file is tiny so multipart download corruption isn't a problem, and it's immediately available in S3 after upload (read-after-write for new uploads). The artifacts themselves are then immutable.