datalad-handbook / book

Sources for the DataLad handbook
http://handbook.datalad.org
Other
148 stars 55 forks source link

Section highlighting datalad-fuse and dandisets? #899

Open yarikoptic opened 1 year ago

yarikoptic commented 1 year ago

@asmacdo showed interest to participate in ongoing handbook hackathon and I thought that it might be great to have a use case show case for dandisets (super dataset at https://github.com/dandi/dandisets, individual at https://github.com/dandisets, asyncio code to update those from the archive within the tools/ of dandisets) and https://github.com/datalad/datalad-fuse/ extension. Dandisets are "special" in that typical files are there large but for access to metadata etc, only small portion of data is needed to be accessed. In datalad-fuse we use https://github.com/fsspec/filesystem_spec/ with local caching, to provide efficient sparse access to remote annexed files which have an http* url associated with them.

In datalad core we had a request for streaming https://github.com/datalad/datalad/issues/4003 -- so it might be useful to highlight how streaming could be implemented, via fsspec interface within datalad-fuse or directly via FUSE filesystem of that one.

WDYT datalad-handbook folks about such a section? (attn @adswa @mih)

adswa commented 1 year ago

Sure, sounds like a cool usecase to me. There is a rough structure that usecases are usually following: http://handbook.datalad.org/en/latest/contributing.html#use-cases

yarikoptic commented 1 year ago

more specific target for the use case could be

adswa commented 1 year ago

Just ping me if you need any infos. You should add a new file in docs/usecases and place it somewhere in the docs/usecases/intro.rst toctree. Usecases do not need to have code that is executed and captured, so you can go with .. code-block::s instead of .. runrecords::. Looking forward to it!