dandi / dandi-cli

DANDI command line client to facilitate common operations
https://dandi.readthedocs.io/
Apache License 2.0
21 stars 25 forks source link

Make file upload be robust to file size changes #1373

Closed yarikoptic closed 9 months ago

yarikoptic commented 9 months ago

Per request of @jwodder in https://github.com/dandi/dandi-cli/issues/1257#issuecomment-1850127241 filing it as a separate issue.

Background: As discovered in that issue we caught NFS (or some layer above on the way to Python) lying about file size to be 0 whenever it is not: the reported size gets changed within few seconds. We also spotted S3 failing some uploads due to "Content-MD5 you specified did not match what we received", an exact reason for which we do not know. So, apparently we cannot really rely on file system to provide us robust file size, and thus we should 1. provide mitigation to the 0-size file issue and to possibly help isolate any other incorrect file size reporting add a safe-guard.

I am filing it as a single issue since IMHO two aspects are very related and likely to be implemented in a concerted effort. I will provide only high level description of desired behavior instead of mandating any particular implementation.

jwodder commented 9 months ago

Upon initiation of upload do a filesize check: a file size reported to be 0, loop for up to 2 seconds (sleeping may be 100ms between) re-checking the file size. If it changes from 0 - can immediately proceed forward. If it remains 0 - proceed forward with 0 file size after those 2 seconds.

What exactly is the goal of re-checking zero-sized files? Are you assuming that, once NFS reports a non-zero size, it will report a non-zero size again when requests queries the file size? Or do you want the value from this check to be used as the Content-Length in the upload request? The latter would require always monkeypatching super_len().

jwodder commented 9 months ago

loop for up to 2 seconds (sleeping may be 100ms between)

To be clear: Do you mean that the total time spent looping should be no more than two seconds or that the individual sleeps should increase up to a maximum of two seconds each? Either way, how many iterations through the loop should there be?

yarikoptic commented 9 months ago

What exactly is the goal of re-checking zero-sized files? Are you assuming that, once NFS reports a non-zero size, it will report a non-zero size again when requests queries the file size?

yes, that is the assumption I have that once it is non-zero, it stays non-zero for the nearest future. I can be proven wrong.

Or do you want the value from this check to be used as the Content-Length in the upload request?

Per my assumption above, I was assuming that it would be the same value used in Content-Length.

The latter would require always monkeypatching super_len().

It is ok with me if it would be so (as I have mentioned before), we might just need to add a unittest to catch a moment when requests RFs their API so there is no use of that super_len (by may be making a test where we patch but raise exception there and then use requests to upload - if succeeds without that exception raised -- they changed behavior)

yarikoptic commented 9 months ago

loop for up to 2 seconds (sleeping may be 100ms between)

To be clear: Do you mean that the total time spent looping should be no more than two seconds or that the individual sleeps should increase up to a maximum of two seconds each? Either way, how many iterations through the loop should there be?

individual sleeps of e.g. 100ms (no need to increase delay IMHO), total to 2 sec, so 20 iterations.

github-actions[bot] commented 8 months ago

:rocket: Issue was released in 0.59.0 :rocket: