Closed jwodder closed 1 year ago
I assume you have tried it -- does it pass on any dandiset files ATM?
@yarikoptic I have not tried it, as I was waiting to complete a run of the pynwb tests first.
ok, lets then just merge and fix up what is needed
@yarikoptic After adding the MatNWB test, the script has been running for over 20 hours and is still on the first Dandiset (000003, which previously took only a couple of minutes).
heh, that is indeed non-scaleable. But it is unlikely due to locking -- how long it takes for a single pynwb and matnwb test to run on some sample file in that dandiset with full content available or via fuse? I want to assess impact of each factor (library, fuse) to see what needs to be improved.
@yarikoptic Do you want me to perform those measurements while the healthcheck script is running and Drogon is overloaded beyond 60, or should I kill the script first?
yeah , good point -- I killed /mnt/backup/dandi/dandisets/tools/backups2datalad-populate-zarrs-cron
(I think its processes did complete) and there is now only populate on regular assets running -- might want to wait for it or not. Indeed better to kill healthcheck currently running since we are not to live that long to see its end.
FWIW I also py-spy recorded a sample period within that datalad fuse-fs
and both times it looked like (and that was seconds!!)
which is odd why every one of those steps taking so long, e.g. for release
and even read
if that (whatever it was) file was accessed before (during pynwb run) and thus cached locally? Can number of files in the local cache/opened by the process (>100) have impact/effect on that?
FWIW a note is that most of the cache files are opened twice by the fuse process
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000108$ ls -l /proc/3433101/fd | awk '/0003.*cache/{print $11}' | sort | uniq -c | sort -n | tail
2 /mnt/backup/dandi/dandisets-healthstatus/dandisets/000003/.git/datalad/cache/fsspec/c70bb5cb067563c34bc64f9fb30ac1e7539261f5fa63ebfcccdd6caeed0b6b6e
2 /mnt/backup/dandi/dandisets-healthstatus/dandisets/000003/.git/datalad/cache/fsspec/cd6360468b592ab3e0ad3decda0ead2f86d95256959f8d5c254e97df5ec4ffa4
2 /mnt/backup/dandi/dandisets-healthstatus/dandisets/000003/.git/datalad/cache/fsspec/d9c1bf061935ba6755dd6022ce75ed4d9439870643d3b2de35e014f31957149d
2 /mnt/backup/dandi/dandisets-healthstatus/dandisets/000003/.git/datalad/cache/fsspec/df19691e335b9eb36a8a08c80c45660534735da087819cba85ec431a47b75a2b
2 /mnt/backup/dandi/dandisets-healthstatus/dandisets/000003/.git/datalad/cache/fsspec/e59d4b30a58794e5e78fa62b727003b05da5030a49eb313904dd72b6d4848ddb
2 /mnt/backup/dandi/dandisets-healthstatus/dandisets/000003/.git/datalad/cache/fsspec/ed9a71ec7b699e115aa6edcb876fbdca4a7f743ae9be5d7c64e0d69b1c196e12
2 /mnt/backup/dandi/dandisets-healthstatus/dandisets/000003/.git/datalad/cache/fsspec/f05497f2f9a0336caed9fdc44aa77ddaa72ba204fa56d8778dfe5aa41a27aceb
2 /mnt/backup/dandi/dandisets-healthstatus/dandisets/000003/.git/datalad/cache/fsspec/f3a675d01534a76ac95a69070f0cf16cae144fca478a6ff63cbb69fd95227b57
2 /mnt/backup/dandi/dandisets-healthstatus/dandisets/000003/.git/datalad/cache/fsspec/f6cd7c1c35eb3e7fed5342373d3e9852389b51a64e78ce1adf20970098461b39
2 /mnt/backup/dandi/dandisets-healthstatus/dandisets/000003/.git/datalad/cache/fsspec/fe5d2fbc846ac6ea56d052905cb6851c9c176e44ca829ea9c2d73a960d835283
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000108$ ls -l /proc/3433101/fd | awk '/0003.*cache/{print $11}' | sort | uniq -c | sort -n | head
1 /mnt/backup/dandi/dandisets-healthstatus/dandisets/000003/.git/datalad/cache/fsspec/6ba1b2e907320757a6c11f0053e5b8de7ba6ef121ecb6f5ef3377cf78aa695fe
1 /mnt/backup/dandi/dandisets-healthstatus/dandisets/000003/.git/datalad/cache/fsspec/74c5dcd30d9268279a91d99b6c59203fe1ae9b54cdb5895d5f43cfa636971a31
1 /mnt/backup/dandi/dandisets-healthstatus/dandisets/000003/.git/datalad/cache/fsspec/ec8a3fd2ea860860c9624310cda941c538436d73cfe6ec4b7d8c48fbe0f416a8
2 /mnt/backup/dandi/dandisets-healthstatus/dandisets/000003/.git/datalad/cache/fsspec/0152ee14905402995c9f82ed31dbc134e5a4f15ce2930172817232ac521d4a9a
2 /mnt/backup/dandi/dandisets-healthstatus/dandisets/000003/.git/datalad/cache/fsspec/05f594410a7327f244e7653ea4f6b398c0b51cacdb9a5f74c53ed75b4cd3811c
2 /mnt/backup/dandi/dandisets-healthstatus/dandisets/000003/.git/datalad/cache/fsspec/0a1fd7f90f55831317cdbdb3ec36de43791888d7ed8a0c7017a8e613e19cb7fb
2 /mnt/backup/dandi/dandisets-healthstatus/dandisets/000003/.git/datalad/cache/fsspec/0abd98522a8c2b281cdb993c823ab46cbeab7ade4c9957c0a72271f19a3eda65
2 /mnt/backup/dandi/dandisets-healthstatus/dandisets/000003/.git/datalad/cache/fsspec/0e721289b622d0a9d39aad55d7812b7ea8178096e0f1afa5516cf6b48caf7f6f
2 /mnt/backup/dandi/dandisets-healthstatus/dandisets/000003/.git/datalad/cache/fsspec/119ac0b0d1c1057a6ebef6a36f9d5af0207a9caa305fac25c60d02479b0f4c9c
2 /mnt/backup/dandi/dandisets-healthstatus/dandisets/000003/.git/datalad/cache/fsspec/14162645a31ccefe2ad1515b8781470e1a0868c0fd92483fd055247868ec8379
@yarikoptic Running the tests directly on a download of sub-YutaMouse20/sub-YutaMouse20_ses-YutaMouse20-140321_behavior+ecephys.nwb
in Dandiset 000003 (size: 11,694,713,998 bytes, chosen simply because it was the first asset) took 3.284s for the PyNWB test and 51.135s for the MatNWB test. Running the test again immediately afterwards took 2.567s for PyNWB and 16.746s for MatNWB.
ok, and what about from fuse-mounted location for that file?
edit: so -- matlab test is inherently > 10 times slower than Python one. If took 3 days for python one to get through all dandisets, would take month for matlab :-/
btw, where do you have that sub-YutaMouse20/sub-YutaMouse20_ses-YutaMouse20-140321_behavior+ecephys.nwb
on drogon?
@yarikoptic Running the tests on that file via FUSE took 3.521s for PyNWB and 42.548s for MatNWB. Running the tests on a different FUSEd file that hadn't been downloaded to the dataset (size: 15,090,018,364 bytes) took 6.229s for PyNWB and ... the MatNWB test is still running after more than ten minutes.
The file is in the Dandisets clone, at /mnt/backup/dandi/dandisets-healthstatus/dandisets/000003/sub-YutaMouse20/sub-YutaMouse20_ses-YutaMouse20-140321_behavior+ecephys.nwb
, fetched with datalad get
.
@yarikoptic Final time for the second MatNWB test was 19m25.628s.
@yarikoptic Final time for the second MatNWB test was 19m25.628s.
and upon rerun which would not need fetching the data -- same-ish under minute timing?
overall suggesting that
FYI, I'm running into issues installing h5py but after I figure that out I will do an A2A read on that file using pynwb for comparison. So far with MatNWB I'm getting ~4-5 second reads from MATLAB startup (so no JIT optimizations). Without A2A I wouldn't read too deeply into this, however.
ignorant me: what is A2A? ;-) Let us know if you need any help with h5py/pynwb installation. The easiest should be conda distribution. If you have any python3, we could give you a 1 liner using https://github.com/datalad/datalad-installer/blob/master/src/datalad_installer.py which must ( ;-) ) work!
A2A being apples to apples comparison. It's possible that my workstation is simply absurdly fast and pynwb will run in 0.3 seconds so it wouldn't be useful comparing with the other times posted in this thread.
So yeah, it takes 0.11 seconds for the equivalent python script to run. That fits with what has been observed with a 10x slowdown.
I believe the major difference is that MatNWB does actually write a lot more to disk since it's generating the class files (generateSpec is >50% of the processing time). Running with the ignorecache
flag lowers my processing time from 4-5 seconds to ~2 seconds.
If file writing is removed, the slowest part of MatNWB is the type validation which requires occasionally sampling from the file.
@yarikoptic Final time for the second MatNWB test was 19m25.628s.
and upon rerun which would not need fetching the data -- same-ish under minute timing?
No. I ran the tests twice again, and the MatNWB times were 11m23.130s and 10m44.171s.
If file writing is removed,
How can we do that?
@yarikoptic
If file writing is removed,
How can we do that?
If you use the ignorecache
argument option (i.e. nwb = nwbRead('file.nwb', 'ignorecache')
) MatNWB will not generate class files from the embedded spec in the NWB file. Of course this will render the file virtually unreadable if the file contains extensions of any sort or a version of the schema that's incompatible with the currently generated one.
Of course this will render the file virtually unreadable if the file contains extensions of any sort or a version of the schema that's incompatible with the currently generated one.
@jwodder -- let's keep current "writing" version but use /mnt/fast/dandi/dandisets-healthstatus
as the location where that writing would happen -- may be main storage spindles are indeed slowing it down a little.
It's possible that my workstation is simply absurdly fast and pynwb will run in 0.3 seconds so it wouldn't be useful comparing with the other times posted in this thread So yeah, it takes 0.11 seconds for the equivalent python script to run.
@lawrence-mbf you were timing on sub-YutaMouse20/sub-YutaMouse20_ses-YutaMouse20-140321_behavior+ecephys.nwb
in Dandiset 000003 (size: 11,694,713,998 bytes) correct? what was your script that it was that fast? ;-)
showed
on drogon (aging beast but still potent, not busy ATM) on that file that from total time of 1.902s indeed only 0.232s was actual load + str/repr printing. And that is on "warm" run.
on typhon (a new box, IO is not occupied, data on ssd for this, 1TB of RAM, conda installation of hdmf/pynwb) I did get 1.234s total script run time with 0.106 for actual load/str/repr. So indeed your workstation is fast!
So I did timing of matlab loading cycle,
/tmp
(on the same ssd iirc; should move into RAM)(dev3) yoh@typhon:/tmp$ MATLABPATH=/home/yoh/proj/dandi/matnwb /usr/bin/time matlab -nodesktop -batch "tic; nwb = nwbRead('/home/yoh/proj/dandi/dandisets/000003/sub-YutaMouse20/sub-YutaMouse20_ses-YutaMouse20-140321_behavior+ecephys.nwb', 'savedir', '/tmp/'); toc"
Elapsed time is 4.351224 seconds.
10.36user 1.11system 0:10.14elapsed 113%CPU (0avgtext+0avgdata 1038872maxresident)k
0inputs+1392outputs (9major+226900minor)pagefaults 0swaps
(dev3) yoh@typhon:/tmp$ MATLABPATH=/home/yoh/proj/dandi/matnwb /usr/bin/time matlab -nodesktop -batch "tic; nwb = nwbRead('/home/yoh/proj/dandi/dandisets/000003/sub-YutaMouse20/sub-YutaMouse20_ses-YutaMouse20-140321_behavior+ecephys.nwb', 'ignorecache'); toc"
Elapsed time is 1.980811 seconds.
6.67user 1.21system 0:07.80elapsed 100%CPU (0avgtext+0avgdata 928380maxresident)k
0inputs+112outputs (14major+180020minor)pagefaults 0swaps
which confirms your timings! and on drogon indeed quite slower
(pynwb-recent) dandi@drogon:~/proj/dandi$ MATLABPATH=/home/dandi/proj/dandi/matnwb /usr/bin/time matlab -nodesktop -batch "tic; nwb = nwbRead('/mnt/backup/dandi/dandisets-healthstatus/dandisets/000003/sub-YutaMouse20/sub-YutaMouse20_ses-YutaMouse20-140321_behavior+ecephys.nwb', 'savedir', '/tmp/'); toc"
Elapsed time is 14.932822 seconds.
16.91user 0.88system 0:22.98elapsed 77%CPU (0avgtext+0avgdata 951240maxresident)k
cold and then warm getting faster but still twice slower than on tython:
(pynwb-recent) dandi@drogon:~/proj/dandi$ MATLABPATH=/home/dandi/proj/dandi/matnwb /usr/bin/time matlab -nodesktop -batch "tic; nwb = nwbRead('/mnt/backup/dandi/dandisets-healthstatus/dandisets/000003/sub-YutaMouse20/sub-YutaMouse20_ses-YutaMouse20-140321_behavior+ecephys.nwb', 'savedir', '/tmp/'); toc"
Elapsed time is 8.116742 seconds.
16.68user 0.82system 0:16.02elapsed 109%CPU (0avgtext+0avgdata 940284maxresident)k
0inputs+1392outputs (8major+166313minor)pagefaults 0swaps
with ignorecache getting half of that time
(pynwb-recent) dandi@drogon:~/proj/dandi$ MATLABPATH=/home/dandi/proj/dandi/matnwb /usr/bin/time matlab -nodesktop -batch "tic; nwb = nwbRead('/mnt/backup/dandi/dandisets-healthstatus/dandisets/000003/sub-YutaMouse20/sub-YutaMouse20_ses-YutaMouse20-140321_behavior+ecephys.nwb', 'ignorecache'); toc"
Elapsed time is 3.457994 seconds.
10.20user 0.74system 0:10.90elapsed 100%CPU (0avgtext+0avgdata 879004maxresident)k
@yarikoptic
@jwodder -- let's keep current "writing" version but use
/mnt/fast/dandi/dandisets-healthstatus
as the location where that writing would happen -- may be main storage spindles are indeed slowing it down a little.
Do you just mean that the savedir
for the MatNWB tests should be changed to that folder instead of using a temporary directory?
@yarikoptic
you were timing on sub-YutaMouse20/sub-YutaMouse20_ses-YutaMouse20-140321_behavior+ecephys.nwb in Dandiset 000003 > (size: 11,694,713,998 bytes) correct? what was your script that it was that fast? ;-)
Assuming you meant pynwb it's similar to your script but with time.perf_counter
instead. Not sure if that's a valid benchmark timer.
Closes #2.