Closed jwodder closed 3 months ago
Attention: Patch coverage is 33.33333%
with 4 lines
in your changes missing coverage. Please review.
Project coverage is 60.62%. Comparing base (
c51d691
) to head (2edf8e0
). Report is 2 commits behind head on main.
Files | Patch % | Lines |
---|---|---|
code/src/healthstatus/mounts.py | 33.33% | 4 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@yarikoptic
ask_auth
needs to be set to 0 in /etc/davfs2/davfs2.conf
dandi@drogon
does not have permission to run the appropriate mount
& umount
commands via passwordless sudo
. At the very least, sudo -ln
complains that a password is required.
sudo
for jwodder
, but that's not the user we're running dandisets-healthstatus
under.dandi ALL=(ALL:ALL) NOPASSWD: /usr/bin/umount /tmp/dandisets-fuse
dandi ALL=(ALL:ALL) NOPASSWD: /usr/bin/mount -t davfs https\://webdav.dandiarchive.org /tmp/dandisets-fuse
dandi ALL=(ALL:ALL) NOPASSWD: /usr/bin/umount /mnt/backup/dandi/dandisets-healthstatus/dandisets-fuse
dandi ALL=(ALL:ALL) NOPASSWD: /usr/bin/mount -t davfs https\://webdav.dandiarchive.org /mnt/backup/dandi/dandisets-healthstatus/dandisets-fuse
@yarikoptic The setup should be complete now. Should I just run it in one of my screen sessions or leave that up to you?
please run under your screen so to check if all is good etc.
@yarikoptic Problem: The code previously recorded the hash of the current commit of each Dandiset repository, but with davfs2, we don't have access to that information.
that's interesting! we need some kind of a notion of a version, e.g. last-modified-date. Could you query from dandi-archive API per each dandiset?
we are going through drafts only avoiding releases, right?
hold on -- you expose that in webdav:
does davfs2 expose it as mtime for the draft/
folder ? if so -- take that and be done?
@yarikoptic The script seems to be running fine now, but now just traversing Dandiset file hierarchies seems slower than before.
how much slower? my "diagnostic" is the fact that there is barely any python or MATLAB cpu busy tasks :-/
it might relate to
tail -f
the current log file on drogon:/mnt/backup/dandi/heroku-logs/dandidav it "pauses" at heroku[web.1]: Error R14 (Memory quota exceeded)
although it could be red-herring.@yarikoptic I had to restart the script in order to apply a fix. Looking at the script's output now, just the listing of the dandisets/
directory took 30 seconds to get from 000003 to 000026, at which point it got stuck for three minutes, then four more entries were read in 6 seconds, and it now seems to be alternating between stalling for several minutes and retrieving small numbers of entries. I do not have any timings for datalad-fuse to compare against.
Before you propose any explanations, let me point out that requests to dandidav
to list the entries of dandisets/
are responded to with the complete directory listing, so this is presumably davfs2's fault for not caching appropriately. It seems v1.7.0 of davfs2 added caching, but it ended up making things slower.
@yarikoptic I'm almost done rewriting the code to traverse Dandisets via the Archive API instead of mounted-WebDAV (as discussed in the meeting last week). However, there are issues with updating the test-files
subcommand (for running a test on a local file and optionally saving the test results to the Dandiset's status.yaml
). Do we want to keep this subcommand? What about keeping its ability to save to status.yaml
? If yes to both, should the command assume that the given file paths are always in a WebDAV mount? If not, how should the Dandiset ID and asset path be determined?
Do we have dandiset.yaml
in the top of the dandiset? If yes -- just traversal up until hitting it and that would give you ID and top path to deduce asset path. Cache results of such helper function per each folder value so we do not re-traverse without need.
Can't check now since I believe davfs2 is simply hanging and thus two hanging jobs I have mentioned last week:
dandi 4133314 0.0 0.0 0 0 ? D Jun29 0:02 [python]
dandi 4188696 0.0 0.0 422664 4408 ? D Jun29 0:10 /home/dandi/cronlib/dandisets-healthstatus/venv/bin/python /home/dandi/cronlib/dandisets-healthstatus/venv/lib/python3.8/site-packages/healthstatus/pynwb_open_load_ns.py /mnt/backup/dandi/dandisets-healthstatus/dandisets-fuse/dandisets/000045/draft/sub-01be78e7-8741-4b40-bd64-79ed745431b5/sub-01be78e7-8741-4b40-bd64-79ed745431b5_ses-f6f55a42-6105-4cc9-9c8c-3d3264828b8b_behavior+image.nwb
dandi 4188715 0.0 0.0 414872 4420 ? D Jun29 0:02 /home/dandi/cronlib/dandisets-healthstatus/venv/bin/python /home/dandi/cronlib/dandisets-healthstatus/venv/lib/python3.8/site-packages/healthstatus/pynwb_open_load_ns.py /mnt/backup/dandi/dandisets-healthstatus/dandisets-fuse/dandisets/000059/draft/sub-MS12/sub-MS12_ses-Peter-MS12-170719-095305-concat_desc-raw_ecephys.nwb
dandi 4189021 0.0 0.0 0 0 ? Zl Jun29 0:02 [python] <defunct>
dandi 4189095 0.0 0.0 0 0 ? Zl Jun29 0:02 [python] <defunct>
those should be killed, davfs2 restarted.
@yarikoptic
Do we have
dandiset.yaml
in the top of the dandiset?
If test-files
is run on a clone, copy, or WebDAV-mount of a Dandiset, yes. However, if it's run on a davfs2 mount, experience suggests that traversing upwards looking for a dandiset.yaml
will be slow.
Moreover, if updating status.yaml
, currently test-files
needs to traverse the whole of the Dandiset it's operating on in order to get all asset paths in order to prune assets from status.yaml
that have since been deleted. The options for handling this are:
status.yaml
.test-files
's behavior of pruning deleted assetsWhich one should I go with?
those should be killed, davfs2 restarted.
I ran kill -9
on the pynwb_open_load_ns.py
last week, but they're still stuck. I don't know what those zombie processes are. I sent you a message in Slack asking you to use sudo
to kill the davsf2 process. I tried unmounting the davfs2 mount when I started writing this comment, but the process has just been hanging since.
However, if it's run on a davfs2 mount, experience suggests that traversing upwards looking for a
dandiset.yaml
will be slow.
we do not even need to list the directory, just rely on os.path.lexists
or alike while lru caching the call so that for a folder we never do it twice through runtime of that process. Moreover could be optimized that if any of the parents already known to have dandiset.yaml
- no need to ask for any child folder. This way it would literally be once per dandiset traversal up.
@yarikoptic Please address the rest of my comment as well.
Re test-files
-- couldn't we just completely disable dealing with the files which are not given to the test-files
invocation, i.e. pruning them etc? Pruning deleted paths in general should probably be the job for "dandiset-wide" operation (even if we randomly select only some to go through).
@yarikoptic Yes, that is option number 4.
Cool, let's go with it
@yarikoptic Another problem: When running test-files
on a Dandiset, what should be done about updating the draft_modified
timestamp in status.yaml
? Should it be updated and, if so, to what value/how should that value be obtained? What if status.yaml
does not exist yet or has not yet been updated to use draft_modified
?
For test-files
-- i.e. running test on specific files only -- I do not think we should update draft_modified
. It is mostly an interface to troubleshoot errors and timeouts while also "not wasting" doing so and possibly updating those files records. As such, can avoid concerning itself with any dandiset wide (removal of gone entries, draft_modifed, etc) operations.
Closes #76.
After this is merged,
~dandi/cronlib/dandisets-healthstatus
on drogon will have to be updated to the new code (manually?).