dandi / dandisets

737 Dandisets, 812.2 TB total. DataLad super-dataset of all Dandisets from https://github.com/dandisets
10 stars 0 forks source link

expose zarr checksum compute based on git-annex keys #287

Closed yarikoptic closed 1 year ago

yarikoptic commented 1 year ago

ATM it is not available and dandi-cli can't do that AFAIK either

(dandisets) dandi@drogon:/mnt/backup/dandi/dandizarrs/fc3f8c83-f4a9-48e4-a67c-aa24b530f82a$ dandi digest -d zarr-checksum .
2022-10-27 07:20:52,862 [   ERROR] Error scanning directory /mnt/backup/dandi/dandizarrs/fc3f8c83-f4a9-48e4-a67c-aa24b530f82a
Traceback (most recent call last):
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/dandi/support/digests.py", line 121, in digest_file
    dgst = known[relpath]
KeyError: '.zattrs'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/dandi/support/threaded_walk.py", line 63, in worker
    item = func(p) if func is not None else p
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/dandi/support/digests.py", line 123, in digest_file
    dgst = md5file_nocache(f)
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/dandi/support/digests.py", line 215, in md5file_nocache
    return Digester(["md5"])(filepath)["md5"]
  File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/dandi/support/digests.py", line 76, in __call__
    with open(fpath, "rb") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/backup/dandi/dandizarrs/fc3f8c83-f4a9-48e4-a67c-aa24b530f82a/.zattrs'
.: 1789f31044fec468447c971eaede0ccd-63--36347031
2022-10-27 07:20:52,968 [    INFO] Logs saved in /home/dandi/.cache/dandi-cli/log/20221027112052Z-4016488.log

(dandisets) dandi@drogon:/mnt/backup/dandi/dandizarrs/fc3f8c83-f4a9-48e4-a67c-aa24b530f82a$ cd -
/mnt/backup/dandi/dandisets

(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets$ time python -m tools.backups2datalad --help
Usage: python -m tools.backups2datalad [OPTIONS] COMMAND [ARGS]...

Options:
  -B, --backup-root DIRECTORY
  -c, --config FILE
  -J, --jobs INTEGER              How many parallel jobs to use when
                                  downloading and pushing
  -l, --log-level [CRITICAL|ERROR|WARNING|INFO|DEBUG]
                                  Set logging level  [default: INFO]
  --pdb                           Drop into debugger if an error occurs
  --quiet-debug                   Log backups2datalad at DEBUG and all other
                                  loggers at INFO
  --help                          Show this message and exit.

Commands:
  backup-zarrs
  populate
  populate-zarrs
  release
  update-from-backup
  update-github-metadata  Update the homepages and descriptions for the...

the need -- divergence in checksum as reported in #286