Open jjnesbitt opened 7 months ago
is there demand/use-case to target here? FWIW - md5 is chosen since it is the one used by AWS for ETag compute so we then
zarr access via manifests
approach to computer remote zarr checksum.I'm not proposing change the default behavior, I think that should stay as md5, to match S3's implementation (as that is the initial reason for choosing md5). However, this conversation in the zarr-python
repo highlighted someone's need for this tool, but with a different hashing algorithm.
Since this seems like it would be a common use case, and in that thread we got a pseudo-endorsement from one of the zarr-python
contributors to use this package (since this functionality doesn't currently exist in zarr
), I think it would be worthwhile to generalize the algorithm in a backwards compatible way.
This will probably have to wait until higher priority things have been addressed in DANDI, although I might take some of own time to poke around at this, since it interests me.
Currently
md5
is assumed to be the choice of checksum algorithm, but we should allow for the user to supply their own algorithm if they so choose.