datalad / datalad-fuse

DataLad extension to provide FUSE file system access
Other
1 stars 4 forks source link

get stats (size) from the file/git without talking to remote location #84

Closed yarikoptic closed 1 year ago

yarikoptic commented 1 year ago

@jwodder pointed out in https://github.com/dandi/dandisets-healthstatus/issues/1#issuecomment-1322081261 that the major slow down is opening remote file for each stat call, thus simply while traversing the dataset.

There is a TODO https://github.com/datalad/datalad-fuse/blob/24727b80ebb6590e04f19b743dd293ac8b767170/datalad_fuse/fuse_.py#L149 added in original prototype times in cdb14f52340dffe3f418a19c4fb7cbffa1d588a3 so we should indeed just get that size from the annex key and only if key lacks size (relaxed mode) -- do the remote call.

jwodder commented 1 year ago

@yarikoptic Upon further inspection, it appears that the code was actually hitting this block, so I'm not sure why it was taking so long:

https://github.com/datalad/datalad-fuse/blob/24727b80ebb6590e04f19b743dd293ac8b767170/datalad_fuse/fuse_.py#L122-L125

yarikoptic commented 1 year ago

hm, how do you figure out which block is a bottleneck?

I have attached py-spy top -p 3433101 on drogon and sampled for a bit to see

Total Samples 8400
GIL: 12.00%, Active: 38.00%, Threads: 23

  %Own   %Total  OwnTime  TotalTime  Function (filename)                                                                                                                                                                                                                                                      
 12.00%  15.00%   10.55s    11.88s   read (datalad_fuse/fuse_.py)
  7.00%   7.00%    5.02s     5.40s   open (datalad_fuse/fuse_.py)
  4.00%   4.00%    2.02s     2.02s   read (ssl.py)
  2.00%   2.00%    1.25s     1.25s   _worker (concurrent/futures/thread.py)
  2.00%   3.00%    1.22s     1.37s   release (datalad_fuse/fuse_.py)
  2.00%   2.00%    1.19s     1.20s   _fetch (fsspec/caching.py)
  0.00%   7.00%   0.600s     4.44s   _read_ready__data_received (asyncio/selector_events.py)
  1.00%   2.00%   0.590s     1.22s   data_received (aiohttp/client_proto.py)

at the top so no getattr seems to be "involved".

jwodder commented 1 year ago

@yarikoptic I'm not saying that the code I linked above was the bottleneck; I'm saying that this code was not hit (instead the "Broken symlink" code was hit), so it can't be the bottleneck.

Note that just scanning Dandiset 26 took 9 and a half hours despite it not containing any NWBs, so it seems that some part of the directory traversal is taking too long.

yarikoptic commented 1 year ago

Note that just scanning Dandiset 26 took 9 and a half hours despite it not containing any NWBs, so it seems that some part of the directory traversal is taking too long.

I will look into it - did you use --mode-transparent? (the process is gone on drogon ATM, can't lookup)

jwodder commented 1 year ago

@yarikoptic Yes, I used --mode-transparent, as I needed to query Git for the latest commit hash of each dataset.

yarikoptic commented 1 year ago
FWIW on my laptop calling find across entire mounted 000026 takes only ~20 sec (and under 1sec for the same on original mounted repo) ```shell ❯ time bash -c 'find /tmp/fusefs/ | nl | tail' 54980 /tmp/fusefs/sub-KC001/ses-MRI/anat/sub-KC001_ses-MRI_echo-3_flip-1_VFA.nii.gz 54981 /tmp/fusefs/sub-KC001/ses-MRI/anat/sub-KC001_ses-MRI_echo-3_flip-2_VFA.nii.gz 54982 /tmp/fusefs/sub-KC001/ses-MRI/anat/sub-KC001_ses-MRI_echo-3_flip-3_VFA.nii.gz 54983 /tmp/fusefs/sub-KC001/ses-MRI/anat/sub-KC001_ses-MRI_echo-3_flip-4_VFA.nii.gz 54984 /tmp/fusefs/sub-KC001/ses-MRI/anat/sub-KC001_ses-MRI_echo-4_flip-1_VFA.nii.gz 54985 /tmp/fusefs/sub-KC001/ses-MRI/anat/sub-KC001_ses-MRI_echo-4_flip-2_VFA.nii.gz 54986 /tmp/fusefs/sub-KC001/ses-MRI/anat/sub-KC001_ses-MRI_echo-4_flip-3_VFA.nii.gz 54987 /tmp/fusefs/sub-KC001/ses-MRI/anat/sub-KC001_ses-MRI_echo-4_flip-4_VFA.nii.gz 54988 /tmp/fusefs/sub-KC001/sub-KC001_sessions.tsv 54989 /tmp/fusefs/dandiset.yaml bash -c 'find /tmp/fusefs/ | nl | tail' 0.58s user 2.21s system 13% cpu 20.937 total ❯ time bash -c 'find ~/proj/dandi/dandisets/000026/ | nl | tail' 54980 /home/yoh/proj/dandi/dandisets/000026/sub-KC001/ses-MRI/anat/sub-KC001_ses-MRI_echo-3_flip-1_VFA.nii.gz 54981 /home/yoh/proj/dandi/dandisets/000026/sub-KC001/ses-MRI/anat/sub-KC001_ses-MRI_echo-3_flip-2_VFA.nii.gz 54982 /home/yoh/proj/dandi/dandisets/000026/sub-KC001/ses-MRI/anat/sub-KC001_ses-MRI_echo-3_flip-3_VFA.nii.gz 54983 /home/yoh/proj/dandi/dandisets/000026/sub-KC001/ses-MRI/anat/sub-KC001_ses-MRI_echo-3_flip-4_VFA.nii.gz 54984 /home/yoh/proj/dandi/dandisets/000026/sub-KC001/ses-MRI/anat/sub-KC001_ses-MRI_echo-4_flip-1_VFA.nii.gz 54985 /home/yoh/proj/dandi/dandisets/000026/sub-KC001/ses-MRI/anat/sub-KC001_ses-MRI_echo-4_flip-2_VFA.nii.gz 54986 /home/yoh/proj/dandi/dandisets/000026/sub-KC001/ses-MRI/anat/sub-KC001_ses-MRI_echo-4_flip-3_VFA.nii.gz 54987 /home/yoh/proj/dandi/dandisets/000026/sub-KC001/ses-MRI/anat/sub-KC001_ses-MRI_echo-4_flip-4_VFA.nii.gz 54988 /home/yoh/proj/dandi/dandisets/000026/sub-KC001/sub-KC001_sessions.tsv 54989 /home/yoh/proj/dandi/dandisets/000026/dandiset.yaml bash -c 'find ~/proj/dandi/dandisets/000026/ | nl | tail' 0.17s user 0.13s system 136% cpu 0.217 total ```

what did you mean exactly by "just scanning Dandiset 26" which "took 9 and a half hours " @jwodder ?

jwodder commented 1 year ago

@yarikoptic In the files produced by the first full run of the script, the timestamps on 000025/status.yaml and 000026/status.yaml are 9 and a half hours apart (06:05 vs. 15:47). Dandiset 000026 contains no NWBs, and the script currently processes only one Dandiset at a time (with the NWBs within a Dandiset processed in parallel), so the only thing it could have been doing for those nine hours was traversing 000026. Perhaps it's better now that there's a cache present.

yarikoptic commented 1 year ago

Please identify how long for it to get through 000026 (now that drogon is not as busy) and what it is spending time on - plain listing, or stating files, or ... ?

jwodder commented 1 year ago

@yarikoptic From a preliminary examination of the FUSE logs, it appears that, even though the listing of files in each directory triggers the "Broken symlink" block for each file, the Python script then needs to know whether each node is a directory or not (so it can know whether to traverse into each node). Since we reused the results from lstat(2), each file in the mount appears to be a symlink, so Python follows each link and stat(2)'s the target to determine whether it's a directory, and that is what triggers the expensive "File not already open" block.

Doing this seems to take about 2 to 3 seconds for each annexed file, though about half the files in 000026 happens to be unannexed JSON, so it's a net effect of 1 file every 1 to 1.5 seconds. And there are 55,483 assets in the Dandiset...

yarikoptic commented 1 year ago

thanks for the investigation but I am still confused on what takes so long since os.lstat seems to take just few milliseconds on those symlinks!:

drogon:/mnt/backup/dandi/dandisets/000026
$> find sub-I38 -type l | xargs /usr/bin/time python3 -c 'import os, glob, time, sys; t0=time.time(); stats=[os.lstat(p) for p in sys.argv[1:]]; print(f"{(time.time()-t0)/len(stats)} per each among {len(stats)} files")'    
3.34116664246051e-06 per each among 1370 files
0.03user 0.01system 0:00.04elapsed 97%CPU (0avgtext+0avgdata 12924maxresident)k
0inputs+0outputs (0major+2214minor)pagefaults 0swaps
3.293495038490156e-06 per each among 1365 files
0.02user 0.01system 0:00.04elapsed 100%CPU (0avgtext+0avgdata 13008maxresident)k
0inputs+0outputs (0major+2214minor)pagefaults 0swaps
3.1270589297118423e-06 per each among 1364 files
0.03user 0.00system 0:00.04elapsed 100%CPU (0avgtext+0avgdata 12964maxresident)k
0inputs+0outputs (0major+2209minor)pagefaults 0swaps
2.986966732750023e-06 per each among 1363 files
0.02user 0.01system 0:00.03elapsed 97%CPU (0avgtext+0avgdata 12836maxresident)k
0inputs+0outputs (0major+2202minor)pagefaults 0swaps
3.09035891578311e-06 per each among 1365 files
0.03user 0.00system 0:00.03elapsed 97%CPU (0avgtext+0avgdata 13072maxresident)k
0inputs+0outputs (0major+2199minor)pagefaults 0swaps
3.0648577344286574e-06 per each among 1365 files
0.02user 0.01system 0:00.04elapsed 100%CPU (0avgtext+0avgdata 12956maxresident)k
0inputs+0outputs (0major+2207minor)pagefaults 0swaps
3.4453410234045176e-06 per each among 1362 files
0.02user 0.01system 0:00.04elapsed 97%CPU (0avgtext+0avgdata 12912maxresident)k
0inputs+0outputs (0major+2212minor)pagefaults 0swaps
2.963318784012754e-06 per each among 585 files
0.02user 0.01system 0:00.03elapsed 100%CPU (0avgtext+0avgdata 10892maxresident)k
0inputs+0outputs (0major+1568minor)pagefaults 0swaps

may be drogon was overloaded and IO was slower but still I doubt it was the factor of thousands.

jwodder commented 1 year ago

@yarikoptic It's not just lstat. The key thing is that, because we return the results of lstating a symlink, each file appears to be a symlink. When the healthcheck script then calls is_dir() on a path, it resolves the symlink and calls stat(2) on the target, and that leads to datalad-fuse opening a file via fsspec.

yarikoptic commented 1 year ago
gotcha -- confirming that `isdir` on a symlink causes going to fsspec on initial invocation, potentially multiple seconds, on subsequent fast ```shell ❯ time python -c 'import os.path; print(os.path.isdir("/tmp/fusefs/sub-I38/ses-MRI/anat/sub-I38_ses-MRI_echo-4_flip-4_VFA.nii.gz"))' False python -c 0.08s user 0.01s system 7% cpu 1.239 total ❯ time python -c 'import os.path; print(os.path.isdir("/tmp/fusefs/sub-I38/ses-MRI/anat/sub-I38_ses-MRI_echo-4_flip-4_VFA.nii.gz"))' False python -c 0.07s user 0.01s system 91% cpu 0.092 total 2022-11-22 16:38:49,416 [DEBUG ] op=getattr for path=/sub-I38 with args (None,) 2022-11-22 16:38:49,417 [DEBUG ] op=getattr for path=/sub-I38/ses-MRI with args (None,) 2022-11-22 16:38:49,417 [DEBUG ] op=getattr for path=/sub-I38/ses-MRI/anat with args (None,) 2022-11-22 16:38:49,418 [DEBUG ] op=getattr for path=/sub-I38/ses-MRI/anat/sub-I38_ses-MRI_echo-4_flip-4_VFA.nii.gz with args (None,) 2022-11-22 16:38:49,418 [DEBUG ] op=readlink for path=/sub-I38/ses-MRI/anat/sub-I38_ses-MRI_echo-4_flip-4_VFA.nii.gz with args () 2022-11-22 16:38:49,418 [DEBUG ] readlink(path='/home/yoh/proj/dandi/dandisets/000026/sub-I38/ses-MRI/anat/sub-I38_ses-MRI_echo-4_flip-4_VFA.nii.gz') 2022-11-22 16:38:49,419 [DEBUG ] op=getattr for path=/.git with args (None,) 2022-11-22 16:38:49,420 [DEBUG ] op=getattr for path=/.git/annex with args (None,) 2022-11-22 16:38:49,420 [DEBUG ] op=getattr for path=/.git/annex/objects with args (None,) 2022-11-22 16:38:49,421 [DEBUG ] op=getattr for path=/.git/annex/objects/VM with args (None,) 2022-11-22 16:38:49,421 [DEBUG ] getattr(path='/home/yoh/proj/dandi/dandisets/000026/.git/annex/objects/VM', fh=None) 2022-11-22 16:38:49,421 [DEBUG ] Returning {'st_atime': 1665084519.9968834, 'st_ctime': 1669139692.2436833, 'st_gid': 47522, 'st_mode': 16832, 'st_mtime': 1669139692.2436833, 'st_nlink': 1, 'st_size': 448, 'st_uid': 47521} for /home/yoh/proj/dandi/dandisets/000026/.git/annex/objects/VM 2022-11-22 16:38:49,422 [DEBUG ] op=getattr for path=/.git/annex/objects/VM/kg with args (None,) 2022-11-22 16:38:49,422 [DEBUG ] getattr(path='/home/yoh/proj/dandi/dandisets/000026/.git/annex/objects/VM/kg', fh=None) 2022-11-22 16:38:49,423 [DEBUG ] Returning {'st_atime': 1665084519.9968834, 'st_ctime': 1669139692.2436833, 'st_gid': 47522, 'st_mode': 16832, 'st_mtime': 1669139692.2436833, 'st_nlink': 1, 'st_size': 448, 'st_uid': 47521} for /home/yoh/proj/dandi/dandisets/000026/.git/annex/objects/VM/kg 2022-11-22 16:38:49,423 [DEBUG ] op=getattr for path=/.git/annex/objects/VM/kg/SHA256E-s4888071542--8d084564030738bb2faac314f3fb4ef54000058c9503c95ce26ba082a7b58ec0.nii.gz with args (None,) 2022-11-22 16:38:49,423 [DEBUG ] getattr(path='/home/yoh/proj/dandi/dandisets/000026/.git/annex/objects/VM/kg/SHA256E-s4888071542--8d084564030738bb2faac314f3fb4ef54000058c9503c95ce26ba082a7b58ec0.nii.gz', fh=None) 2022-11-22 16:38:49,424 [DEBUG ] Returning {'st_atime': 1665084519.9968834, 'st_ctime': 1669139692.2436833, 'st_gid': 47522, 'st_mode': 16832, 'st_mtime': 1669139692.2436833, 'st_nlink': 1, 'st_size': 448, 'st_uid': 47521} for /home/yoh/proj/dandi/dandisets/000026/.git/annex/objects/VM/kg/SHA256E-s4888071542--8d084564030738bb2faac314f3fb4ef54000058c9503c95ce26ba082a7b58ec0.nii.gz 2022-11-22 16:38:49,424 [DEBUG ] op=getattr for path=/.git/annex/objects/VM/kg/SHA256E-s4888071542--8d084564030738bb2faac314f3fb4ef54000058c9503c95ce26ba082a7b58ec0.nii.gz/SHA256E-s4888071542--8d084564030738bb2faac314f3fb4ef54000058c9503c95ce26ba082a7b58ec0.nii.gz with args (None,) 2022-11-22 16:38:49,425 [DEBUG ] getattr(path='/home/yoh/proj/dandi/dandisets/000026/.git/annex/objects/VM/kg/SHA256E-s4888071542--8d084564030738bb2faac314f3fb4ef54000058c9503c95ce26ba082a7b58ec0.nii.gz/SHA256E-s4888071542--8d084564030738bb2faac314f3fb4ef54000058c9503c95ce26ba082a7b58ec0.nii.gz', fh=None) 2022-11-22 16:38:49,425 [DEBUG ] File not already open 2022-11-22 16:38:49,426 [DEBUG ] /home/yoh/proj/dandi/dandisets/000026/.git/annex/objects/VM/kg/SHA256E-s4888071542--8d084564030738bb2faac314f3fb4ef54000058c9503c95ce26ba082a7b58ec0.nii.gz/SHA256E-s4888071542--8d084564030738bb2faac314f3fb4ef54000058c9503c95ce26ba082a7b58ec0.nii.gz: path resolved to .git/annex/objects/VM/kg/SHA256E-s4888071542--8d084564030738bb2faac314f3fb4ef54000058c9503c95ce26ba082a7b58ec0.nii.gz/SHA256E-s4888071542--8d084564030738bb2faac314f3fb4ef54000058c9503c95ce26ba082a7b58ec0.nii.gz in dataset at /home/yoh/proj/dandi/dandisets/000026 2022-11-22 16:38:49,426 [DEBUG ] get_file_state: .git/annex/objects/VM/kg/SHA256E-s4888071542--8d084564030738bb2faac314f3fb4ef54000058c9503c95ce26ba082a7b58ec0.nii.gz/SHA256E-s4888071542--8d084564030738bb2faac314f3fb4ef54000058c9503c95ce26ba082a7b58ec0.nii.gz 2022-11-22 16:38:49,427 [DEBUG ] .git/annex/objects/VM/kg/SHA256E-s4888071542--8d084564030738bb2faac314f3fb4ef54000058c9503c95ce26ba082a7b58ec0.nii.gz/SHA256E-s4888071542--8d084564030738bb2faac314f3fb4ef54000058c9503c95ce26ba082a7b58ec0.nii.gz: under annex, does not have content 2022-11-22 16:38:49,427 [DEBUG ] .git/annex/objects/VM/kg/SHA256E-s4888071542--8d084564030738bb2faac314f3fb4ef54000058c9503c95ce26ba082a7b58ec0.nii.gz/SHA256E-s4888071542--8d084564030738bb2faac314f3fb4ef54000058c9503c95ce26ba082a7b58ec0.nii.gz: opening via fsspec 2022-11-22 16:38:49,428 [DEBUG ] Run ['git', '-c', 'diff.ignoreSubmodules=none', 'annex', 'whereis', '--key', 'SHA256E-s4888071542--8d084564030738bb2faac314f3fb4ef54000058c9503c95ce26ba082a7b58ec0.nii.gz', '--json', '--json-error-messages', '-c', 'annex.dotfiles=true'] (protocol_class=AnnexJsonProtocol) (cwd=/home/yoh/proj/dandi/dandisets/000026) 2022-11-22 16:38:49,552 [DEBUG ] Caught Unterminated string starting at: line 1 column 959 (char 958) while trying to parse JSON line b'{"command":"whereis","note":"1 copy\\n\\t00000000-0000-0000-0000-000000000001 -- web\\n\\nThe following untrusted locations may also have copies:\\n\\t727f466f-60c3-4778-90b2-b2332856c2f8 -- dandi-dandisets-dropbox\\n\\nweb: https://api.dandiarchive.org/api/assets/10970c9d-3747-43c5-be13-f20401f0ee03/download/\\nweb: https://api.dandiarchive.org/api/assets/3d764fae-5be2-4690-9c7b-41102a867f7d/download/\\nweb: https://dandiarchive.s3.amazonaws.com/blobs/ebf/42e/ebf42ec8-2d07-42e5-a4b2-0174055ec9b7?versionId=Zbw1h11FyBs3.a2TB7rjKnhAAzs0rELS\\n","success":true,"input":[],"untrusted":[{"here":false,"uuid":"727f466f-60c3-4778-90b2-b2332856c2f8","urls":[],"description":"dandi-dandisets-dropbox"}],"key":"SHA256E-s4888071542--8d084564030738bb2faac314f3fb4ef54000058c9503c95ce26ba082a7b58ec0.nii.gz","whereis":[{"here":false,"uuid":"00000000-0000-0000-0000-000000000001","urls":["https://api.dandiarchive.org/api/assets/10970c9d-3747-43c5-be13-f20401f0ee03/download/","https://api.dandiarchive.org/api/assets/3d764fae-5be2-4690-9c7b-4' which might be not yet a full line 2022-11-22 16:38:49,567 [DEBUG ] Finished ['git', '-c', 'diff.ignoreSubmodules=none', 'annex', 'whereis', '--key', 'SHA256E-s4888071542--8d084564030738bb2faac314f3fb4ef54000058c9503c95ce26ba082a7b58ec0.nii.gz', '--json', '--json-error-messages', '-c', 'annex.dotfiles=true'] with status 0 2022-11-22 16:38:49,567 [DEBUG ] .git/annex/objects/VM/kg/SHA256E-s4888071542--8d084564030738bb2faac314f3fb4ef54000058c9503c95ce26ba082a7b58ec0.nii.gz/SHA256E-s4888071542--8d084564030738bb2faac314f3fb4ef54000058c9503c95ce26ba082a7b58ec0.nii.gz: Attempting to open via URL https://api.dandiarchive.org/api/assets/10970c9d-3747-43c5-be13-f20401f0ee03/download/ 2022-11-22 16:38:50,555 [DEBUG ] File object is fsspec object 2022-11-22 16:38:50,557 [DEBUG ] Returning {'st_uid': 47521, 'st_gid': 47522, 'st_mode': 33188, 'st_size': 4888071542, 'st_blksize': 5242880, 'st_nlink': 1, 'st_atime': 1668114160.0, 'st_ctime': 1668114160.0, 'st_mtime': 1668114160.0} for /home/yoh/proj/dandi/dandisets/000026/.git/annex/objects/VM/kg/SHA256E-s4888071542--8d084564030738bb2faac314f3fb4ef54000058c9503c95ce26ba082a7b58ec0.nii.gz/SHA256E-s4888071542--8d084564030738bb2faac314f3fb4ef54000058c9503c95ce26ba082a7b58ec0.nii.gz 2022-11-22 16:38:53,832 [DEBUG ] op=getattr for path=/sub-I38 with args (None,) 2022-11-22 16:38:53,832 [DEBUG ] op=getattr for path=/sub-I38/ses-MRI with args (None,) 2022-11-22 16:38:53,833 [DEBUG ] op=getattr for path=/sub-I38/ses-MRI/anat with args (None,) 2022-11-22 16:38:53,834 [DEBUG ] op=getattr for path=/sub-I38/ses-MRI/anat/sub-I38_ses-MRI_echo-4_flip-4_VFA.nii.gz with args (None,) 2022-11-22 16:38:53,834 [DEBUG ] op=readlink for path=/sub-I38/ses-MRI/anat/sub-I38_ses-MRI_echo-4_flip-4_VFA.nii.gz with args () 2022-11-22 16:38:53,834 [DEBUG ] readlink(path='/home/yoh/proj/dandi/dandisets/000026/sub-I38/ses-MRI/anat/sub-I38_ses-MRI_echo-4_flip-4_VFA.nii.gz') 2022-11-22 16:38:53,835 [DEBUG ] op=getattr for path=/.git with args (None,) 2022-11-22 16:38:53,835 [DEBUG ] op=getattr for path=/.git/annex with args (None,) 2022-11-22 16:38:53,836 [DEBUG ] op=getattr for path=/.git/annex/objects with args (None,) 2022-11-22 16:38:53,837 [DEBUG ] op=getattr for path=/.git/annex/objects/VM with args (None,) 2022-11-22 16:38:53,837 [DEBUG ] op=getattr for path=/.git/annex/objects/VM/kg with args (None,) 2022-11-22 16:38:53,838 [DEBUG ] op=getattr for path=/.git/annex/objects/VM/kg/SHA256E-s4888071542--8d084564030738bb2faac314f3fb4ef54000058c9503c95ce26ba082a7b58ec0.nii.gz with args (None,) 2022-11-22 16:38:53,838 [DEBUG ] op=getattr for path=/.git/annex/objects/VM/kg/SHA256E-s4888071542--8d084564030738bb2faac314f3fb4ef54000058c9503c95ce26ba082a7b58ec0.nii.gz/SHA256E-s4888071542--8d084564030738bb2faac314f3fb4ef54000058c9503c95ce26ba082a7b58ec0.nii.gz with args (None,) ```
yarikoptic commented 1 year ago

So -- let's address that

                # TODO: it is expensive to open each file just for `getattr`!
                # We should just fabricate stats from the key here or not even
                # bother???!

by

  1. extending is_annex_dir_or_key to return what it matches for the parsed key, including size if present, and thus at if dir_or_key == "key": above that comment to manufacture the r stat based on the size we get from the annex key. Then we would not get to that "TODO" while looking at key files under .git/annex/objects.
  2. at the comment place -- if path is a symlink pointing to .git/annex/objects -- do the same logic as in 1. on resolved symlink/key.
jwodder commented 1 year ago

@yarikoptic

Please identify how long for it to get through 000026

Running the script again on just 000026 took 18 hours (from 15:02 yesterday to 8:54 today).

yarikoptic commented 1 year ago

Thank you! Definitely we should do better ;-)