dandi / dandi-archive

DANDI API server and Web app
https://dandiarchive.org
13 stars 12 forks source link

​/zarr​/{zarr_id}​/upload​/: make `etag` (`md5` ) optional #912

Closed yarikoptic closed 2 years ago

yarikoptic commented 2 years ago

see https://github.com/dandi/dandi-cli/issues/915#issuecomment-1047037299 and there on.

dchiquito commented 2 years ago

@yarikoptic the md5 field is used to do integrity checking on the files being uploaded. It's not necessary for the final aggregated zarr checksum, but upload integrity seems important to me. Skipping the checksum calculations on the CLI side would make the upload faster, but is that worth losing the integrity check?

yarikoptic commented 2 years ago

IMHO -- I would prefer md5'ing, but @satra's goal is to investigate possibility to reach parity with s5cmd which doesn't do any check-summing AFAIK. Making it optional would allow client to "skip" digesting altogether, but given that we are doing digestion anyways to even assess if file has changed on remote end, not sure if really worth it ATM. Main bottleneck ATM is /complete delay per batch. So, let's moth-ball this issue until/if individual file md5ing becomes the bottleneck?

dchiquito commented 2 years ago

@waxlamp I'm leaving this issue open for now so it's more findable, we can close it eventually if this turns out to be unnecessary.

waxlamp commented 2 years ago

@yarikoptic can we close this, or is it still important to address on its own merits? Daniel's last comment implied that this was only left open for "findability" which I don't think is enough reason on its own to leave this open.

yarikoptic commented 2 years ago

well, we got other battles to fight since then to get zarr uploads going in a robust fashion. The question of disabling upload integrity checking didn't come up since then, let's close I guess. To ease findability of such issues later, added closed-without-resolution label (please use for cases like this).