Closed yarikoptic closed 1 year ago
@yarikoptic Upon further inspection, it appears that the code was actually hitting this block, so I'm not sure why it was taking so long:
hm, how do you figure out which block is a bottleneck?
I have attached py-spy top -p 3433101
on drogon and sampled for a bit to see
Total Samples 8400
GIL: 12.00%, Active: 38.00%, Threads: 23
%Own %Total OwnTime TotalTime Function (filename)
12.00% 15.00% 10.55s 11.88s read (datalad_fuse/fuse_.py)
7.00% 7.00% 5.02s 5.40s open (datalad_fuse/fuse_.py)
4.00% 4.00% 2.02s 2.02s read (ssl.py)
2.00% 2.00% 1.25s 1.25s _worker (concurrent/futures/thread.py)
2.00% 3.00% 1.22s 1.37s release (datalad_fuse/fuse_.py)
2.00% 2.00% 1.19s 1.20s _fetch (fsspec/caching.py)
0.00% 7.00% 0.600s 4.44s _read_ready__data_received (asyncio/selector_events.py)
1.00% 2.00% 0.590s 1.22s data_received (aiohttp/client_proto.py)
at the top so no getattr
seems to be "involved".
@yarikoptic I'm not saying that the code I linked above was the bottleneck; I'm saying that this code was not hit (instead the "Broken symlink" code was hit), so it can't be the bottleneck.
Note that just scanning Dandiset 26 took 9 and a half hours despite it not containing any NWBs, so it seems that some part of the directory traversal is taking too long.
Note that just scanning Dandiset 26 took 9 and a half hours despite it not containing any NWBs, so it seems that some part of the directory traversal is taking too long.
I will look into it - did you use --mode-transparent
? (the process is gone on drogon ATM, can't lookup)
@yarikoptic Yes, I used --mode-transparent
, as I needed to query Git for the latest commit hash of each dataset.
what did you mean exactly by "just scanning Dandiset 26" which "took 9 and a half hours " @jwodder ?
@yarikoptic In the files produced by the first full run of the script, the timestamps on 000025/status.yaml
and 000026/status.yaml
are 9 and a half hours apart (06:05 vs. 15:47). Dandiset 000026 contains no NWBs, and the script currently processes only one Dandiset at a time (with the NWBs within a Dandiset processed in parallel), so the only thing it could have been doing for those nine hours was traversing 000026. Perhaps it's better now that there's a cache present.
Please identify how long for it to get through 000026 (now that drogon is not as busy) and what it is spending time on - plain listing, or stating files, or ... ?
@yarikoptic From a preliminary examination of the FUSE logs, it appears that, even though the listing of files in each directory triggers the "Broken symlink" block for each file, the Python script then needs to know whether each node is a directory or not (so it can know whether to traverse into each node). Since we reused the results from lstat(2)
, each file in the mount appears to be a symlink, so Python follows each link and stat(2)
's the target to determine whether it's a directory, and that is what triggers the expensive "File not already open" block.
Doing this seems to take about 2 to 3 seconds for each annexed file, though about half the files in 000026 happens to be unannexed JSON, so it's a net effect of 1 file every 1 to 1.5 seconds. And there are 55,483 assets in the Dandiset...
thanks for the investigation but I am still confused on what takes so long since os.lstat
seems to take just few milliseconds on those symlinks!:
drogon:/mnt/backup/dandi/dandisets/000026
$> find sub-I38 -type l | xargs /usr/bin/time python3 -c 'import os, glob, time, sys; t0=time.time(); stats=[os.lstat(p) for p in sys.argv[1:]]; print(f"{(time.time()-t0)/len(stats)} per each among {len(stats)} files")'
3.34116664246051e-06 per each among 1370 files
0.03user 0.01system 0:00.04elapsed 97%CPU (0avgtext+0avgdata 12924maxresident)k
0inputs+0outputs (0major+2214minor)pagefaults 0swaps
3.293495038490156e-06 per each among 1365 files
0.02user 0.01system 0:00.04elapsed 100%CPU (0avgtext+0avgdata 13008maxresident)k
0inputs+0outputs (0major+2214minor)pagefaults 0swaps
3.1270589297118423e-06 per each among 1364 files
0.03user 0.00system 0:00.04elapsed 100%CPU (0avgtext+0avgdata 12964maxresident)k
0inputs+0outputs (0major+2209minor)pagefaults 0swaps
2.986966732750023e-06 per each among 1363 files
0.02user 0.01system 0:00.03elapsed 97%CPU (0avgtext+0avgdata 12836maxresident)k
0inputs+0outputs (0major+2202minor)pagefaults 0swaps
3.09035891578311e-06 per each among 1365 files
0.03user 0.00system 0:00.03elapsed 97%CPU (0avgtext+0avgdata 13072maxresident)k
0inputs+0outputs (0major+2199minor)pagefaults 0swaps
3.0648577344286574e-06 per each among 1365 files
0.02user 0.01system 0:00.04elapsed 100%CPU (0avgtext+0avgdata 12956maxresident)k
0inputs+0outputs (0major+2207minor)pagefaults 0swaps
3.4453410234045176e-06 per each among 1362 files
0.02user 0.01system 0:00.04elapsed 97%CPU (0avgtext+0avgdata 12912maxresident)k
0inputs+0outputs (0major+2212minor)pagefaults 0swaps
2.963318784012754e-06 per each among 585 files
0.02user 0.01system 0:00.03elapsed 100%CPU (0avgtext+0avgdata 10892maxresident)k
0inputs+0outputs (0major+1568minor)pagefaults 0swaps
may be drogon was overloaded and IO was slower but still I doubt it was the factor of thousands.
@yarikoptic It's not just lstat. The key thing is that, because we return the results of lstat
ing a symlink, each file appears to be a symlink. When the healthcheck script then calls is_dir()
on a path, it resolves the symlink and calls stat(2)
on the target, and that leads to datalad-fuse opening a file via fsspec.
So -- let's address that
# TODO: it is expensive to open each file just for `getattr`!
# We should just fabricate stats from the key here or not even
# bother???!
by
is_annex_dir_or_key
to return what it matches for the parsed key, including size if present, and thus at if dir_or_key == "key":
above that comment to manufacture the r
stat based on the size we get from the annex key. Then we would not get to that "TODO" while looking at key files under .git/annex/objects. path
is a symlink pointing to .git/annex/objects
-- do the same logic as in 1.
on resolved symlink/key.@yarikoptic
Please identify how long for it to get through 000026
Running the script again on just 000026 took 18 hours (from 15:02 yesterday to 8:54 today).
Thank you! Definitely we should do better ;-)
@jwodder pointed out in https://github.com/dandi/dandisets-healthstatus/issues/1#issuecomment-1322081261 that the major slow down is opening remote file for each
stat
call, thus simply while traversing the dataset.There is a TODO https://github.com/datalad/datalad-fuse/blob/24727b80ebb6590e04f19b743dd293ac8b767170/datalad_fuse/fuse_.py#L149 added in original prototype times in cdb14f52340dffe3f418a19c4fb7cbffa1d588a3 so we should indeed just get that size from the annex key and only if key lacks size (relaxed mode) -- do the remote call.