dandi / dandi-infrastructure

A repository to collect docs/issues on DANDI project infrastructure
Apache License 2.0
0 stars 4 forks source link

FOI: httpfs2 - proof that it might be feasible... #12

Closed yarikoptic closed 2 years ago

yarikoptic commented 4 years ago

We have touched upon this topic a few times -- shouldn't it be possible to access hdf5 files over the network (i.e. over http) since http does support range requests and for basic querying we would need just a bit of binary blobs from there and there, and even for access to specific datasets it should be feasible to get just needed ranges. So, I apt-get installed httpfs2 which is probably a descendant of even older httpfs which had last (or may be first as well? ;)) vital signs in 2006.

So, here is a brief attempt:

$> ifconfig wlp82s0 > wlp82s0.before; httpfs2 -f http://datasets.datalad.org/labs/churchland/najafi-2018-nwb/data/FN_dataSharing/nwb/mouse1_fni16_150819_001_ch2-PnevPanResults-170815-163235.nwb /mnt/httpfs ; ifconfig wlp82s0 > wlp82s0.after
file name:  mouse1_fni16_150819_001_ch2-PnevPanResults-170815-163235.nwb
host name:  datasets.datalad.org
port number:    80
protocol:   http
request path:   /labs/churchland/najafi-2018-nwb/data/FN_dataSharing/nwb/mouse1_fni16_150819_001_ch2-PnevPanResults-170815-163235.nwb
auth data:  (null)
file size:  449712752
httpfs2: /dev/console: 13 Permission denied.
httpfs2: read: 0 Success.
httpfs2: exchange: did not receive a reply, retrying: 0 Success.

ifconfig was just to provide crude assessment of how much traffic there was (nsntrace failed to help), while I was dandi ls'ing that mounted file. It took a while:

$> time dandi ls /mnt/httpfs/mouse1_fni16_150819_001_ch2-PnevPanResults-170815-163235.nwb
- experimenter: Farzaneh Najafi
  identifier: 150819_001_ch2-PnevPanResults-170815-163235
  institution: Cold Spring Harbor Laboratory
  nwb_version: 2.0.2
  path: /mnt/httpfs/mouse1_fni16_150819_001_ch2-PnevPanResults-170815-163235.nwb
  related_publications: https://doi.org/10.1101/354340
  session_description: 150819_001_ch2-PnevPanResults-170815-163235
  session_id: '170815163235'
  session_start_time: 2015-08-18 20:00:00-04:00
  size: 449712752
  subject_id: mouse1_fni16
dandi ls   8.13s user 0.30s system 12% cpu 1:08.89 total

which seems to cause no more than 21MB (comparing ifconfig outputs ;)) of traffic (the full size of the file > 400MB). Sure thing it is slower from running on a local copy (dandi ls 7.32s user 0.14s system 100% cpu 7.452 total) but caused less traffic than getting a full copy (took 1:08.03 total ;) ).

Well, we already know that pynwb is a bit "too non-lazy" which causes possibly some traffic which could have been avoided in this particular case. But it also hints on possibility to use access over regular http for at least some use cases with .nwb files.

Unfortunately httpfs2 had some indigestion for me to try it on https:// urls to buzsaki's data with Gbs in size.

yarikoptic commented 4 years ago

On https://unix.stackexchange.com/questions/67568/mount-http-server-as-file-system found mentioning of other FUSE based solutions

NB will edit here later when get to try avfs

yarikoptic commented 4 years ago

@bendichter do I have a correct memory in that you seemed mentioned that you had some progress in establishing "online" access to HDF5 files? If so, could you please reference here to anything along those efforts?

bendichter commented 4 years ago

@yarikoptic Yes, check out HDF5Zarr. I'd be happy to walk you through it

yarikoptic commented 2 years ago

yep yep, range requests work, used by HDF5 ROS3, fsspec, and using it datalad-fuse, etc. FOI could be closed