Compare performance of webdav+davfs2, webdav+webdavfs, and datalad-fuse

jwodder commented 9 months ago

(Copied from https://github.com/dandi/dandi-infrastructure/pull/164#issuecomment-1875681628 et sequentes)

A script should be written to run & time the following tests:

pynwb_open_load_ns from dandisets-healthstatus
matnwb_nwbRead from dandisets-healthstatus
dandi ls (to load metadata) on a single local asset

These should be run with DANDI_CACHE=ignore set in order to avoid any possible caching side effects from fscacher.

The tests should be run on assets mounted using each of the following methods:

datalad-fuse
dandi-webdav + davfs2
dandi-webdav + webdavfs

The assets to test should be one (or more?) sample assets of some "typical" size (a few GBs). sub-mouse1-fni16/sub-mouse1-fni16_ses-161228151100.nwb in 000016 is suggested as a possible candidate.

Testing should be run on smaug.

Webdavfs has been installed on smaug at /opt/webdavfs/webdavfs.
- Unmounting seems to require running umount as root.
- sudo /usr/local/sbin/unmount-tmp-fuse can be run to forcibly unmount /tmp/dandisets-fuse.
davfs2 is currently installed on smaug both system wide and (for a more recent version) at /opt/davfs2/DESTDIR/usr/local/sbin/ (?), but @yarikoptic reports issues with getting it to work.

@yarikoptic Question: Should the script be standalone or implemented as one or more subcommands of dandisets-healthstatus?

If implementing as dandisets-healthstatus subcommands:
- What subcommands? Should there just be one subcommand that does all the benchmarking at once (mount mounts, run & time tests)? Do we need (as suggested in the original issue) a run_benchmarks command that just runs & times the tests? Should there be dedicated subcommands for mounting each of the three mount types and unmounting once the user hits Ctrl-C? Perhaps one subcommand that mounts a single mount type specified on the command line, runs & times the tests, and then unmounts?
- If the benchmarking is to be implemented as part of dandisets-healthstatus, this issue should be moved to that repository.
If implementing as a separate script, the script will need to either use dandisets-healthstatus as a dependency or else copy essential parts of its code.
- If dandisets-healthstatus is used as a dependency, then since the benchmarking script will be separate from it, this comes with the risk that any future change to dandisets-healthstatus will break the script. One option to address this would be to include a Git commit hash in the benchmarking script's requirements specifier for dandisets-healthstatus, but then the benchmarking script won't get any benefits that may come from future updates to dandisets-healthstatus.
- If we do this, I assume the script should be saved in this repository?

jwodder commented 9 months ago

@yarikoptic Please answer my questions above.

yarikoptic commented 9 months ago

in general I wouldn't mind you choosing the way, but let's me make decision on the first way:

If implementing as dandisets-healthstatus subcommands:

What subcommands? Should there just be one subcommand that does all the benchmarking at once (mount mounts, run & time tests)?

yes, let's call it run_benchmarks_across_mounts (or choose a better one)

Do we need (as suggested in the original issue) a run_benchmarks command that just runs & times the tests?

yes, I think so. Should take a path to operate from. this way we could test against some custom mounted filesystems etc

Should there be dedicated subcommands for mounting each of the three mount types and unmounting once the user hits Ctrl-C?

I guess it might come handy to troubleshoot etc, but I didn't envision it... So something like shell_under_mount?

Perhaps one subcommand that mounts a single mount type specified on the command line, runs & times the tests, and then unmounts?

could be (run_benchmarks_on_mount), or could be just an option --mounts TYPE1,TYPE2,... for run_benchmarks_across_mounts which could be used to limit to 1 or more

jwodder commented 9 months ago

@yarikoptic I still can't decide whether this should be subcommands of dandisets-healthstatus, a separate script that depends on dandisets-healthstatus, or a separate script that copies code from dandisets-healthstatus. If I go with one of the latter two options, would you still want the script to have all of the subcommands that you described in your last comment? Would you still want some subcommands added to dandisets-healthstatus?

Implementing as dandisets-healthstatus subcommands:
- Pros:
  - Integrates with & extends existing code, including adding subcommands that can be used as building blocks for future/more flexible benchmarking
  - No code reuse
- Cons:
  - Causes dandisets-healthstatus to depend (in the non-packaging sense, at least) on dandi-webdav
    - Should dandi-webdav be added as a packaging dependency of dandisets-healthstatus? I don't think its API can be considered sufficiently stable (cf. dandi/dandi-webdav#4).
    - Should the user of dandisets-healthstatus be required to manually install dandi-webdav separately and pass its path and/or other details to dandisets-healthstatus?
  - I suspect that, once the benchmarking is done, we'll want to drop support for using dandisets-healthstatus with anything other than the fastest option, so we'd be adding code just to remove much of it later.
Implementing as an independent script:
- Pros:
  - If this script is stored in the dandi-webdav repository, it can invoke dandidav without needing to declare it as a dependency, and the odds of the dandidav API getting out of sync with the invocations in the benchmarking script are lowered
  - dandisets-healthstatus isn't burdened with any solely-experimental code
- Cons:
  - No functionality added to dandisets-healthstatus
  - Storing this script in the dandi-webdav repository as-is would be awkward, as the repository is already a Python package for different code. Should the dandidav package be moved into a subdirectory? Should the benchmark script be a package as well?
  - If the script uses dandisets-healthstatus as a (packaging) dependency: Any future change to dandisets-healthstatus will likely break the script. One option to address this would be to include a Git commit hash in the benchmarking script's requirements specifier for dandisets-healthstatus, but then the benchmarking script won't get any benefits that may come from future updates to dandisets-healthstatus.
  - If the script copies all relevant code from dandisets-healthstatus: Code duplication

Other concerns:

Some details — like the location of the davfs2 & webdavfs binaries, the command to run to unmount webdavfs, and the directories at which to mount things — could be hardcoded into the script, but that's bad practice.
- One option would be to have these details passed on the command line instead, but there are so many that we'd probably want a dedicated shell script for running everything with the details hardcoded in there (à la run.sh in the dandisets-healthstatus repository), but is that really much better than hardcoding them into the script?
- Alternatively, support for a config file could be added, but would that be overkill?

yarikoptic commented 9 months ago

Cons:

Causes dandisets-healthstatus to depend (in the non-packaging sense, at least) on dandi-webdav

Should dandi-webdav be added as a packaging dependency of dandisets-healthstatus?

only for some extra_depends, e.g. [benchmark-backends] or alike, not generally since not needed.

I don't think its API can be considered sufficiently stable (cf. Support configuration via the command line #4).

Should the user of dandisets-healthstatus be required to manually install dandi-webdav separately and pass its path and/or other details to dandisets-healthstatus?

I think that is ok for now, but could as well just be listed in [benchmark-backends] extra-depends.

overall -- I do not see major cons stated for this one.

As for independent script, I would imagine (if coded exposing some general interface) the pros would be: could be used by others to test some other operations, not necessarily healthchecks but may be some "real" analysis functions on data from the archive, e.g. run-banchmarks-sweep -d /tmp/fuse-mount-here my_analysis_script /tmp/fuse-mount-here/000003/myfile.nwb and see which backend would be the best fit.

Some details — like the location of the davfs2 & webdavfs binaries, the command to run to unmount webdavfs, and the directories at which to mount things — could be hardcoded into the script, but that's bad practice.

can't we just rely on them to be in the PATH, and then which to get absolute one for sudo (if that is needed/concern)?

jwodder commented 9 months ago

@yarikoptic

can't we just rely on them to be in the PATH, and then which to get absolute one for sudo (if that is needed/concern)?

You stated earlier that you had installed webdavfs on smaug at /opt/webdavfs/webdavfs. Are you going to add the binary to some directory in the default PATH, or will users of the script on smaug have to adjust their PATH before running the benchmarks?
You also stated that unmounting of webdavfs should be done via sudo /usr/local/sbin/unmount-tmp-fuse. My assumption was that you only granted my smaug account sudo permissions for that one script and that I can't do sudo umount <whatever>, so either the benchmarking script has to be hardcoded to use unmount-tmp-fuse or it needs a CLI or config option to tell it whether to use umount or unmount-tmp-fuse.

yarikoptic commented 8 months ago

You stated earlier that you had installed webdavfs on smaug at /opt/webdavfs/webdavfs. Are you going to add the binary to some directory in the default PATH, or will users of the script on smaug have to adjust their PATH before running the benchmarks?

added symlink to it now under /usr/local/bin

You also stated that unmounting of webdavfs should be done via sudo /usr/local/sbin/unmount-tmp-fuse. My assumption was that you only granted my smaug account sudo permissions for that one script and that I can't do sudo umount <whatever>, so either the benchmarking script has to be hardcoded to use unmount-tmp-fuse or it needs a CLI or config option to tell it whether to use umount or unmount-tmp-fuse.

yeah -- might need per setup/filesystem type custom unmount operation unfortunately seems to me.

jwodder commented 8 months ago

@yarikoptic I've decided to implement this as dandisets-healthstatus subcommands. Please move this issue to the dandi/dandisets-healthstatus repository.

jwodder commented 8 months ago

@yarikoptic How exactly should mounting & unmounting with webdavfs work? Based on its README, the recommended way to use webdavfs is to install it at /sbin/mount.webdavfs and run sudo mount -t webdavfs $URL $MOUNT_POINT, but webdavfs on smaug is currently installed at /usr/local/bin/webdavfs and run directly. (This discrepancy may be why running umount after stopping the program is currently necessary.)

yarikoptic commented 8 months ago

webdavfs mounted for me without sudo via ./webdavfs http://localhost:8080 /tmp/dandiarchive-fuse which was great. It is indeed for unmounting I found no way to perform it without sudo hence a helper.

Just for consistency, I did symlink mount it as /usr/local/sbin/mount.webdavfs so you could use it with mount but you would need sudo for that, whenever it works fine without sudo if just used as a binary:

$> mount -t webdav http://localhost:8080 /tmp/dandiarchive-fuse
mount: /tmp/dandiarchive-fuse: must be superuser to use mount.

$> webdavfs http://localhost:8080 /tmp/dandiarchive-fuse
http://localhost:8080: no PUT Range support, mounting read-only

jwodder commented 8 months ago

@yarikoptic You didn't answer my question: Which commands should I use for mounting & unmounting webdavfs?

I would argue that invoking the webdavfs binary directly is the wrong thing to do.
I'm quite certain that you can give me permission to run sudo solely for a specific command with specific arguments.

yarikoptic commented 8 months ago

I would argue that invoking the webdavfs binary directly is the wrong thing to do.

why? FWIW it is exactly the same binary used.

I'm quite certain that you can give me permission to run sudo solely for a specific command with specific arguments.

It would need to be mount command I guess... and umount. Anything else? I know that you are sane and I can trust you, so can do it but it would still raise some level of paranoia in me regardless ;-)

yarikoptic commented 8 months ago

I know that you are sane and I can trust you, so can do it but it would still raise some level of paranoia in me regardless ;-)

I even immediately came up with a recipe for disaster:

prepare ext3/4 partition in a file with malicious root setuid file doing evil
mount that partition and run the bad load....

didn't try. but I wonder if smth like that could happen from FUSE filesystem - i.e. could there be root suid'ed content

jwodder commented 8 months ago

@yarikoptic When I said "you can give me permission to run sudo solely for a specific command with specific arguments," I meant that you can give me permission to run, say, sudo mount -t webdavfs http://127.0.0.1:8080 /tmp/dandisets-fuse but not permission to run sudo mount <anything else>.

yarikoptic commented 8 months ago

via wrapper scripts I guess -- yeah, we could do that similarly to that unmount command, no problem.

jwodder commented 8 months ago

@yarikoptic No, not via wrapper scripts (That would just obscure what's going on to readers of the dandisets-healthstatus code). I mean that you can add the following to the sudoers file (SYNTAX NOT CONFIRMED; CONFIRM BEFORE USING):

jwodder ALL=(ALL:ALL) NOPASSWD: /usr/bin/mount -t webdavfs http\://127.0.0.1\:8080 /tmp/dandisets-fuse
jwodder ALL=(ALL:ALL) NOPASSWD: /usr/bin/umount /tmp/dandisets-fuse

and then I, via dandisets-healthstatus, can run the exact commands given there via sudo, but I won't be able to run anything else via sudo.

yarikoptic commented 8 months ago

cool, I didn't know I can specify full command invocations in sudoers.

verified that works on a /bin/ls locally

```shell ❯ sudo grep bin/ls /etc/sudoers [sudo] password for yoh: yoh ALL=(ALL:ALL) NOPASSWD: /bin/ls --color=auto ❯ sudo -k ❯ sudo /bin/ls [sudo] password for yoh: sudo: a password is required ❯ sudo /bin/ls --color=auto ab Maps ... ❯ sudo /bin/ls --color=auto -l [sudo] password for yoh: sudo: a password is required ```

added now those two to try out.

jwodder commented 8 months ago

@yarikoptic

When running the subcommand for timing each test under each mount type, how should the asset(s) to test be passed on the command line? They can't be passed as file paths, as the fusefs mount and the WebDAV mounts use different path structures ({dandiset_id}/{asset_path} vs. {dandiset_id}/draft/{asset_path}). One idea would be for the command-line syntax to be <dandiset-id> <asset-path1> <asset-path2> ..., but this wouldn't let you test assets from multiple Dandisets at once.
How should timing results be reported? Should they just be given in log messages after each test, or should there be a summary in some format after everything is run?

yarikoptic commented 8 months ago

let's assume that paths provided are <dandiset-id>/<asset-path> -- split on first / and have the pair.
I wonder what underlying data structure of https://pypi.org/project/pytest-benchmark/... but we should return list of records (or a dict) with each record having id composed by the benchmark (healthcheck) and the path on which it operated, and timing for it. Then "visualization" summary based on those to tell the winner(s) among FUSE solutions.

jwodder commented 8 months ago

@yarikoptic

but we should return list of records (or a dict)

The benchmarking is invoked as a CLI command, not a Python function. Commands can't return lists.

Then "visualization" summary based on those to tell the winner(s) among FUSE solutions.

What sort of visualization?

yarikoptic commented 8 months ago

but we should return list of records (or a dict)

The benchmarking is invoked as a CLI command, not a Python function. Commands can't return lists.

I meant internally.

Then "visualization" summary based on those to tell the winner(s) among FUSE solutions.

What sort of visualization?

I mean a text summary display in that CLI command at the end. Overall on above two points - just follow the classical MVC design pattern and have that model (structure of results) and view (CLI summary) with controller (benchmarking code). This way later on we can more easily change rendering or add another usage/visualization (e.g. store + summary over different runs etc).

jwodder commented 8 months ago

@yarikoptic If a test fails, should it be included in the visualization? What if a test is killed due to exceeding the one-hour timeout?

yarikoptic commented 8 months ago

hm... I think any fail should be treated as an error in the case of benchmarking and would need to resolve it first.

jwodder commented 8 months ago

@yarikoptic What about timeouts?

yarikoptic commented 8 months ago

error out if timeout happens I think

jwodder commented 8 months ago

@yarikoptic Is davfs2 currently set up so that I can do sudo mount -t davfs2 http://localhost:8080 /tmp/dandiset-fuse, or is the proper command something else?

yarikoptic commented 8 months ago

now you are -- but it is `-t davfs` apparently

```shell ^P(base) smaug:~$ sudo /usr/bin/mount -t davfs http\://127.0.0.1\:8080 /tmp/dandisets-fuse Please enter the username to authenticate with server http://127.0.0.1:8080 or hit enter for none. Username: Please enter the password to authenticate user with server http://127.0.0.1:8080 or hit enter for none. Password: /sbin/mount.davfs: connection timed out two times; trying one last time /sbin/mount.davfs: server temporarily unreachable; mounting anyway (base) smaug:~$ sudo umount /tmp/dandisets-fuse /sbin/umount.davfs: waiting for mount.davfs (pid 535461) to terminate gracefully .. OK ```

we have 1.6.1-1 installed, upstream has 1.7.0. I filed request for update: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1060078 but might just NMU it later although I do not expect any performance changes there judging from changelog

jwodder commented 8 months ago

@yarikoptic Please append the following lines to /etc/davfs2/davfs2.conf so that mount doesn't prompt for a username & password:

[/tmp/dandisets-fuse]
ask_auth 0

jwodder commented 8 months ago

@yarikoptic Also, did you install webdavfs as mount.webdavfs anywhere? When I tried running sudo mount -t webdavfs http://127.0.0.1:8080 /tmp/dandisets-fuse on smaug, I got:

mount: /tmp/dandisets-fuse: unknown filesystem type 'webdavfs'.
       dmesg(1) may have more information after failed mount system call.

EDIT: According to the mount(8) manpage, -t binaries are only looked up in /sbin, but you've installed it at /usr/local/sbin/mount.webdavfs.

yarikoptic commented 8 months ago

@yarikoptic Please append the following lines to /etc/davfs2/davfs2.conf so that mount doesn't prompt for a username & password:
[/tmp/dandisets-fuse]
ask_auth 0

uncommented existing one and changed to 0, but didn't add that path limiter... try

@yarikoptic Also, did you install webdavfs as mount.webdavfs anywhere? When I tried running sudo mount -t webdavfs http://127.0.0.1:8080 /tmp/dandisets-fuse on smaug, I got:
mount: /tmp/dandisets-fuse: unknown filesystem type 'webdavfs'.
       dmesg(1) may have more information after failed mount system call.

it is there

smaug:/mnt/btrfs/scrap
$> ls -l /usr/local/sbin/
total 20
-rwx------ 1 root staff 1647 Jul  2  2015 btrfsQuota*
-rwxr-xr-- 1 root adm     86 Jan 31  2015 flush-caches*
-rwx------ 1 root root    88 Dec 12  2018 flush_caches_kyle*
lrwxrwxrwx 1 yoh  staff   23 Jan 17 13:16 mount.webdavfs -> /usr/local/bin/webdavfs*
-rwsr-xr-x 1 root root    43 Jan  5 12:11 unmount-tmp-fuse*
-rwxr-xr-x 1 root root  3636 Dec 18  2014 zfs-monitor.pl*

$> ls -l /usr/local/bin/webdavfs
lrwxrwxrwx 1 yoh staff 22 Jan 10 09:15 /usr/local/bin/webdavfs -> /opt/webdavfs/webdavfs*

$> ls -l /opt/webdavfs/webdavfs
-rwxr-xr-x 1 yoh yoh 8021561 Jan  5 12:02 /opt/webdavfs/webdavfs*

and indeed odd since shell does find it

smaug:/mnt/btrfs/scrap
$> sudo mount -t webdavfs http://127.0.0.1:8080 /tmp/dandisets-fuse2
mount: /tmp/dandisets-fuse2: unknown filesystem type 'webdavfs'.
       dmesg(1) may have more information after failed mount system call.

$> sudo which mount.webdavfs
/usr/local/sbin/mount.webdavfs

dunno... try to figure it out, if not -- there is mount.webdavfs http://127.0.0.1:8080 /tmp/dandisets-fuse2

yarikoptic commented 8 months ago

EDIT: According to the mount(8) manpage, -t binaries are only looked up in /sbin, but you've installed it at /usr/local/sbin/mount.webdavfs.

oh... ok... hate to do that for local installs, but will do for uniformity (anyways will need to package the damn thing if it ends up to be the winner ;-) )

jwodder commented 8 months ago

@yarikoptic I can get mount -t webdavfs ... to run successfully now, but I get a permissions error when trying to look inside /tmp/dandisets-fuse (and even when just doing ls -l /tmp). If you pass -o allow_other to the mount command, are you able to traverse the mount directory without being root? If so, please add that option to the allowed sudo command.

yarikoptic commented 8 months ago

we have now

smaug# cat /etc/sudoers.d/fuse 
# For benchmarking FUSE
jwodder ALL=(ALL:ALL) NOPASSWD: /usr/local/sbin/unmount-tmp-fuse
jwodder ALL=(ALL:ALL) NOPASSWD: /usr/bin/mount -t webdavfs http\://127.0.0.1\:8080 /tmp/dandisets-fuse
jwodder ALL=(ALL:ALL) NOPASSWD: /usr/bin/mount -t davfs http\://127.0.0.1\:8080 /tmp/dandisets-fuse
jwodder ALL=(ALL:ALL) NOPASSWD: /usr/bin/mount -t webdavfs -o allow_other http\://127.0.0.1\:8080 /tmp/dandisets-fuse
jwodder ALL=(ALL:ALL) NOPASSWD: /usr/bin/mount -t davfs -o allow_other http\://127.0.0.1\:8080 /tmp/dandisets-fuse
jwodder ALL=(ALL:ALL) NOPASSWD: /usr/bin/umount /tmp/dandisets-fuse

yarikoptic commented 7 months ago

FWIW -- tried webdavfs on drogon but fail to get content of .zattrs:

dandi@drogon:/mnt/backup/dandi$ webdavfs -o ro -D http://dandi.centerforopenneuroscience.org dandidav-webdavfs
...
dandi@drogon:/mnt/backup/dandi/dandidav-webdavfs/zarrs$ cat 001/e3b/001e3b6d-26fb-463f-af28-520a25680ab4/326273bcc8730474323a66ea4e3daa49-113328--97037755426.zarr/.zattrs
cat: 001/e3b/001e3b6d-26fb-463f-af28-520a25680ab4/326273bcc8730474323a66ea4e3daa49-113328--97037755426.zarr/.zattrs: Input/output error
...
dandi@drogon:/mnt/backup/dandi/dandidav-webdavfs/dandisets/000108/draft$ head -n 2 dataset_description.json 
head: error reading 'dataset_description.json': Input/output error

so not sure if that works at all now :-/

yarikoptic commented 7 months ago

the same for davfs2... may be none of those supports redirects? since seems to be ok for dandiset.yaml

dandi@drogon:/mnt/backup/dandi$ cat /mnt/backup/dandi/dandidav-davfs2/dandisets/000108/draft/samples.tsv 
cat: /mnt/backup/dandi/dandidav-davfs2/dandisets/000108/draft/samples.tsv: Input/output error
dandi@drogon:/mnt/backup/dandi$ head /mnt/backup/dandi/dandidav-davfs2/dandisets/000108/draft/dandiset.yaml 
id: DANDI:000108/draft
doi: 10.80507/dandi.123456/0.123456.1234
url: https://dandiarchive.org/dandiset/000108/draft
name: Light sheet imaging of the human brain

dang..

jwodder commented 7 months ago

@yarikoptic davfs2 does support redirects, but it has to be enabled by adding follow_redirect 1 to /etc/davfs2/davfs2.conf. With this, I can access files under /zarrs/ but still not under /dandisets/ (maybe it doesn't support double-redirects?).

Also, I found this davfs2 issue that may be relevant to what we're doing: Version 1.7.0 much slower than 1.6.1 (a hundred times slower)

jwodder commented 7 months ago

@yarikoptic I filed a bug report with davfs2 about lack of double-redirect support, but it doesn't look like the maintainers are actively handling bugs lately.

jwodder commented 7 months ago

@yarikoptic webdavfs doesn't support redirects at all; I filed an issue with it requesting support: https://github.com/miquels/webdavfs/issues/30

yarikoptic commented 6 months ago

Per brief discussion during our CON meetup today, just a note here (ping @jwodder) that we need to include in comparison our datalad-fuse solution (described in original post), so we make an informed decision on what backend to use for the healthstatus (currently datalad-fuse is used).

dandi / dandisets-healthstatus

Compare performance of webdav+davfs2, webdav+webdavfs, and datalad-fuse #66