dandi / dandi-cli

DANDI command line client to facilitate common operations
https://dandi.readthedocs.io/
Apache License 2.0
22 stars 26 forks source link

download might fail with tons of `ValueError: Not all part hashes submitted` from dandietag #1500

Open yarikoptic opened 1 month ago

yarikoptic commented 1 month ago

reported by @Kevancic on slack for an attempt to re-download (with -e refresh) https://dandiarchive.org/dandiset/000026 which is respectful File Count 40994 Size 38.4 TB . On my quick attempt to also do download -- all worked just fine.

but looking at the log which reports that error for majority of logged there downloads -- it **feels like our loop for retries within _download_file manages to exit without any error while file is not yet fully downloaded: we looked at a sample failed file and it was not downloaded fully, but it did reach that state of the check for digest which we should reach ONLY if download finished successfully

❯ grep -e 2fb8dcd8-9ff3-47b8-b69a-11204ab2ea9e -e sub-I60/ses-SPIM/micr/sub-I60_ses-SPIM_sample-BrocaAreaS20_stain-Nuclei_chunk-40_SPIM.ome.tif ~/2024.09.10-16.09.03Z-152515.log
2024-09-10T15:39:25-0400 [DEBUG   ] dandi 152515:140114245916480 GET https://api.dandiarchive.org/api/dandisets/000026/versions/draft/assets/2fb8dcd8-9ff3-47b8-b69a-11204ab2ea9e/
2024-09-10T15:39:25-0400 [DEBUG   ] urllib3.connectionpool 152515:140114245916480 https://api.dandiarchive.org:443 "GET /api/dandisets/000026/versions/draft/assets/2fb8dcd8-9ff3-47b8-b69a-11204ab2ea9e/ HTTP/1.1" 200 1280
2024-09-10T15:40:25-0400 [DEBUG   ] dandi 152515:140113753392896 Starting download from https://api.dandiarchive.org/api/assets/2fb8dcd8-9ff3-47b8-b69a-11204ab2ea9e/download/
2024-09-10T15:40:25-0400 [DEBUG   ] urllib3.connectionpool 152515:140113753392896 https://api.dandiarchive.org:443 "GET /api/assets/2fb8dcd8-9ff3-47b8-b69a-11204ab2ea9e/download/ HTTP/1.1" 302 0
2024-09-10T15:40:56-0400 [INFO    ] dandi 152515:140113753392896 Asset 2fb8dcd8-9ff3-47b8-b69a-11204ab2ea9e successfully downloaded
2024-09-10T15:40:57-0400 [ERROR   ] dandi 152515:140113753392896 Caught while downloading 000026/sub-I60/ses-SPIM/micr/sub-I60_ses-SPIM_sample-BrocaAreaS20_stain-Nuclei_chunk-40_SPIM.ome.tif:

❯ grep -A 10 'Caught while downloading 000026/sub-I60/ses-SPIM/micr/sub-I60_ses-SPIM_sample-BrocaAreaS20_stain-Nuclei_chunk-40_SPIM.ome.tif' ~/2024.09.10-16.09.03Z-152515.log
2024-09-10T15:40:57-0400 [ERROR   ] dandi 152515:140113753392896 Caught while downloading 000026/sub-I60/ses-SPIM/micr/sub-I60_ses-SPIM_sample-BrocaAreaS20_stain-Nuclei_chunk-40_SPIM.ome.tif:
Traceback (most recent call last):
  File "/autofs/cluster/exvivo3/dandi_env/dandi_v0.51/lib/python3.9/site-packages/dandi/download.py", line 371, in _download_generator_guard
    yield from generator
  File "/autofs/cluster/exvivo3/dandi_env/dandi_v0.51/lib/python3.9/site-packages/dandi/download.py", line 764, in _download_file
    final_digest = downloaded_digest.hexdigest()  # we care only about hex
  File "/autofs/cluster/exvivo3/dandi_env/dandi_v0.51/lib/python3.9/site-packages/dandischema/digests/dandietag.py", line 207, in hexdigest
    return self.etagger.as_str()
  File "/autofs/cluster/exvivo3/dandi_env/dandi_v0.51/lib/python3.9/site-packages/dandischema/digests/dandietag.py", line 134, in as_str
    raise ValueError("Not all part hashes submitted")
ValueError: Not all part hashes submitted

on filesystem it is

  File: sub-I60/ses-SPIM/micr/sub-I60_ses-SPIM_sample-BrocaAreaS20_stain-Nuclei_chunk-40_SPIM.ome.tif
  Size: 83140248    Blocks: 162384     IO Block: 1048576 regular file
Device: 47h/71d Inode: 14174564916967927168  Links: 1
Access: (0664/-rw-rw-r--)  Uid: (4688076/   kae34)   Gid: ( 1046/   recon)
Context: system_u:object_r:nfs_t:s0
Access: 2024-09-12 15:56:16.675401095 -0400
Modify: 2024-09-10 15:40:57.084080980 -0400
Change: 2024-09-10 15:40:57.092095496 -0400
 Birth: -

so 83MB whenever actual file should have size of 303MB:

❯ dandi ls --metadata all -f json dandi://dandi/000026@draft/sub-I60/ses-SPIM/micr/sub-I60_ses-SPIM_sample-BrocaAreaS20_stain-Nuclei_chunk-40_SPIM.ome.tif  | jq '.[].metadata | ( .id, .contentSize, .path, .digest ) '
2024-09-12 19:18:09,821 [    INFO] Logs saved in /home/yoh/.local/state/dandi-cli/log/2024.09.12-23.18.08Z-327041.log
"dandiasset:2fb8dcd8-9ff3-47b8-b69a-11204ab2ea9e"
303529943
"sub-I60/ses-SPIM/micr/sub-I60_ses-SPIM_sample-BrocaAreaS20_stain-Nuclei_chunk-40_SPIM.ome.tif"
{
  "dandi:dandi-etag": "26634f4ae3043b43e90e36dbf83ddf2f-5",
  "dandi:sha2-256": "926375681bd4c18259ee2266b449288fcca3e6745b883f8e2152c8fa953777fd"
}
yarikoptic commented 1 month ago

candidate "cause":

yarikoptic commented 1 month ago

FWIW, the DANDI_DEVEL_AGGRESSIVE_RETRY setting of