Closed jwodder closed 3 months ago
Attention: Patch coverage is 64.25703%
with 89 lines
in your changes missing coverage. Please review.
Project coverage is 60.71%. Comparing base (
d71a25d
) to head (e331c4c
). Report is 78 commits behind head on main.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
since s3 can have different latencies at different times of the day, let's also make sure we have some estimate of s3 latency during each benchmark if these benchmarks take some time to run. if they run in mins, i would be less worried about latencies, and in such a scenario we should just get multiple estimates to create some error bars.
@satra
make sure we have some estimate of s3 latency
How?
this is old, but something like this: https://github.com/dvassallo/s3-benchmark
@satra This seems like something that should be done separately from dandisets-healthstatus
. Trying to integrate it into this PR doesn't seem sensible.
@yarikoptic Problem: dandisets-healthstatus
requires Pydantic 2.0, yet this PR adds an extra dependency on dandi
(and dandidav
, which requires dandi
), which still requires Pydantic 1.x.
since it all in motion, I think it would be ok to point to that branch you have for dandi-cli with pydantic 2.0 compat
Please provide results of running such benchmarking across possible solutions e.g. on typhon. (should be less busy ATM)
scrape that about typhon, I forgot that we rely on having dandisets
around. Please do it on drogon.
if benchmarks rely on full clone of dandisets/
hierarchy, probably best to just run on drogon
. If you want to replicate the hierarchy then indeed can do on smaug or typhon. Choose the host you deem most appropriate for this.
@yarikoptic I need permission to sudo
-run the following commands on smaug:
/usr/bin/mount -t webdavfs -o allow_other https://webdav.dandiarchive.org /tmp/dandisets-fuse
/usr/bin/mount -t davfs https://webdav.dandiarchive.org /tmp/dandisets-fuse
Note that the colons in the URLs need to be escaped when adding them to the sudoers file.
Also, follow_redirect
in /etc/davfs2/davfs2.conf
needs to be set to 1
.
done
@yarikoptic matlab needs to be installed on smaug so that I can benchmark the associated test.
@yarikoptic Ping.
done now -- the same 2022b version is installed systemwide
@yarikoptic When I try to run a matlab test on smaug, it fails with:
License checkout failed.
License Manager Error -1
The license file cannot be found.
Troubleshoot this issue by visiting:
https://www.mathworks.com/support/lme/R2022b/1
Diagnostic Information:
Feature: MATLAB
License path: /home/jwodder/.matlab/R2022b_licenses:/usr/local/MATLAB/R2022b/licenses/license.dat:/usr/local/MATLA
B/R2022b/licenses
Licensing error: -1,359. System Error: 2
Note that there is no /usr/local/MATLAB/R2022b/licenses
folder on the server.
could you please give me full matlab invocation to ensure to work correctly? on smaug you do it under your account or some other (like datalad etc)?
@yarikoptic
matlab -nodesktop -batch 'nwb = nwbRead('"'"'/tmp/dandisets-fuse/000016/sub-mouse1-fni16/sub-mouse1-fni16_ses-161228151100.nwb'"'"')'
where /tmp/dandisets-fuse
is a FUSE mount and there is a copy of matnwb in matnwb/
in the current directory (and the envvar MATLABPATH
points to this matnwb/
). The command is run under my account.
command didn't run under my account on drogon, but worked (errored but past the license check) under dandi
so it is user specific somewhere... strace
pointed to /home/dandi/.matlab/R2022b_licenses/
... but can't be just copied since "Your username does not match the username in the license file." .. started matlab's initiator script under VNC on smaug under my login but provided jwodder as the target login, changed permissions for the license... now works for jwodder account (none else, bleh)
@yarikoptic The benchmarking is failing because the matlab test on FUSE is exceeding the 1-hour timeout. I tried increasing the timeout to 2 hours, but it exceeded that as well. Should I try increasing the timeout to something incredibly high or take another approach?
how long would it run on that file if downloaded in full? if it is just generally very slow (half an hour) -- might be smth to relay to matnwb.
@yarikoptic 42 seconds
hm. Any ideas on why fuse solution takes that long? how long it takes with datalad-fuse?
@yarikoptic I don't know why it's so slow with FUSE, and I don't know how long it would take with FUSE, as the benchmark code kills the process at the 2-hour timeout.
please make time out 5 hours and run against both fuse solutions -- datalad-fuse and dandidav + davfs2
ideally: profile datalad-fuse while running the test to see where it spends time.
@yarikoptic The matnwb test on datalad-fuse exceeded the five-hour time limit as well.
How exactly should I profile it? Just use py-spy?
First - py-spy would not hurt indeed.
Then I would have probably added log lines at DEBUG level within datalad-fuse to see what is actually taking time there if py-spy was not conclusive.
@yarikoptic Is there a way to get datalad's logs to include timestamps?
yes, there is also a number of other possibly helpful options (available through env vars or even git config since defined in common_cfg) for augmenting logging behavior:
❯ pwd
/home/yoh/proj/datalad/datalad-maint
❯ grep DATALAD_LOG CONTRIBUTING.md
- *DATALAD_LOG_LEVEL*:
- *DATALAD_LOG_NAME*:
- *DATALAD_LOG_OUTPUTS*:
- *DATALAD_LOG_PID*
- *DATALAD_LOG_TARGET*
- *DATALAD_LOG_TIMESTAMP*:
- *DATALAD_LOG_TRACEBACK*:
- *DATALAD_LOG_VMEM*:
EDIT: I realized that I wasn't setting the environment for the datalad fusefs
command correctly, so PATH
et alii were being wiped out.
@yarikoptic I believe sub-mouse1-fni16/sub-mouse1-fni16_ses-161228151100.nwb
in 000016 was a poor choice of asset to test, as it has seemingly always timed out in the normal fusefs tests, and it continues to time out when testing out benchmarking. (Thus, as the timing-out is not specific to the benchmarking, if you want me to investigate it, you should file a separate issue.) Please choose another asset to test the benchmarking on, one that isn't currently marked as timing out.
Let's try on sub-mouse1-fni16/sub-mouse1-fni16_ses-170808184141.nwb
in the same dandiset.. if I read yaml correctly it is ok for pynwb and errors out on matnwb (but does not timeout). In general -- feel welcome to choose any asset you deem appropriate and not too "easy" (fast)
@yarikoptic I finally got a run that didn't time out by using a 44 MB asset from Dandiset 000005. Here's the output, converted to a table:
Mount Type | Dandiset | Asset | Test | Time (s) |
---|---|---|---|---|
fusefs | 000005 | sub-anm236462/sub-anm236462_ses-20140210_behavior+icephys.nwb | pynwb_open_load_ns | 11.972222546115518 |
fusefs | 000005 | sub-anm236462/sub-anm236462_ses-20140210_behavior+icephys.nwb | matnwb_nwbRead | 447.08360775373876 |
fusefs | 000005 | sub-anm236462/sub-anm236462_ses-20140210_behavior+icephys.nwb | dandi_ls | 21.88728654384613 |
davfs2 | 000005 | sub-anm236462/sub-anm236462_ses-20140210_behavior+icephys.nwb | pynwb_open_load_ns | 4.709459913894534 |
davfs2 | 000005 | sub-anm236462/sub-anm236462_ses-20140210_behavior+icephys.nwb | matnwb_nwbRead | 18.62446365132928 |
davfs2 | 000005 | sub-anm236462/sub-anm236462_ses-20140210_behavior+icephys.nwb | dandi_ls | 3.7525609582662582 |
So davfs2
is much more promising. What are timings for datalad-fuse for the same file? (since that is what we use ATM)
@yarikoptic Those are the "fusefs" entries.
Closes #66.
To do:
pynwb_open_load_ns
matnwb_nwbRead
DANDI_CACHE=ignore dandi ls
(to load metadata) on a single local assetsub-mouse1-fni16/sub-mouse1-fni16_ses-161228151100.nwb
in 000016 is suggested as a possible candidate.