Closed yarikoptic closed 2 years ago
@yarikoptic the md5
field is used to do integrity checking on the files being uploaded. It's not necessary for the final aggregated zarr checksum, but upload integrity seems important to me. Skipping the checksum calculations on the CLI side would make the upload faster, but is that worth losing the integrity check?
IMHO -- I would prefer md5'ing, but @satra's goal is to investigate possibility to reach parity with s5cmd which doesn't do any check-summing AFAIK. Making it optional would allow client to "skip" digesting altogether, but given that we are doing digestion anyways to even assess if file has changed on remote end, not sure if really worth it ATM. Main bottleneck ATM is /complete
delay per batch. So, let's moth-ball this issue until/if individual file md5ing becomes the bottleneck?
@waxlamp I'm leaving this issue open for now so it's more findable, we can close it eventually if this turns out to be unnecessary.
@yarikoptic can we close this, or is it still important to address on its own merits? Daniel's last comment implied that this was only left open for "findability" which I don't think is enough reason on its own to leave this open.
well, we got other battles to fight since then to get zarr uploads going in a robust fashion. The question of disabling upload integrity checking didn't come up since then, let's close I guess. To ease findability of such issues later, added closed-without-resolution
label (please use for cases like this).
see https://github.com/dandi/dandi-cli/issues/915#issuecomment-1047037299 and there on.