dotmesh-io / roadmap

Roadmap: where we take the product in the next 1-6 months in terms of features
0 stars 0 forks source link

Cryptographic commit hashes #45

Open alaric-dotmesh opened 6 years ago

alaric-dotmesh commented 6 years ago

To support data science reproducibility and other use cases where historic verifiability is crucial, support cryptographic commit hashes.

Suggested mechanism:

  1. Pick an algorithm for making a canonical hash of a filesystem state.
  2. When committing, after the snapshot is done, make a hash of its contents (including commit metadata).
  3. Store the hash alongside the commit ID (put it in ZFS commit metadata too, albeit excluded from the hash algorithm...), and display it as "COMMIT ID:HASH" in the UI.
  4. Commands that take a commit ID to roll back to / mount read only / export as a tarball / whatever will, when given the full "COMMIT ID:HASH" syntax (optionally - on or off by default?), verify the hash of the contents of the given commit ID and print a message (somewhere) saying that the has passed, or fail with an error if not.
lukemarsden commented 6 years ago

S3 supports handing back server-side MD5, https://github.com/aws/aws-sdk-android/blob/4de3a3146d66d9ab5684eb5e71d5a2cef9f4dec9/aws-android-sdk-s3/src/main/java/com/amazonaws/services/s3/AmazonS3Client.java#L1302

Related: https://docs.google.com/document/d/1VFigteB-8QTmNpIIobfPYnu8R9pnKW_zHazA_NAdwM0/edit#