Open yarikoptic opened 1 year ago
Sure, sounds like a cool usecase to me. There is a rough structure that usecases are usually following: http://handbook.datalad.org/en/latest/contributing.html#use-cases
more specific target for the use case could be
dandi validate
to validate one of the dandisets, show how much is downloaded (du -scm .git/datalad/cache
or whatever that path is) from the total size of files which are validated.datalad-fuse
target as "we can run any external command, not just python code, which accesses those files. As we are doing in https://github.com/dandi/dandisets-healthstatus for MATLAB matnwb library"Just ping me if you need any infos. You should add a new file in docs/usecases
and place it somewhere in the docs/usecases/intro.rst
toctree. Usecases do not need to have code that is executed and captured, so you can go with .. code-block::
s instead of .. runrecords::
. Looking forward to it!
@asmacdo showed interest to participate in ongoing handbook hackathon and I thought that it might be great to have a use case show case for dandisets (super dataset at https://github.com/dandi/dandisets, individual at https://github.com/dandisets, asyncio code to update those from the archive within the
tools/
of dandisets) and https://github.com/datalad/datalad-fuse/ extension. Dandisets are "special" in that typical files are there large but for access to metadata etc, only small portion of data is needed to be accessed. In datalad-fuse we use https://github.com/fsspec/filesystem_spec/ with local caching, to provide efficient sparse access to remote annexed files which have an http*url
associated with them.In datalad core we had a request for streaming https://github.com/datalad/datalad/issues/4003 -- so it might be useful to highlight how streaming could be implemented, via fsspec interface within datalad-fuse or directly via FUSE filesystem of that one.
WDYT datalad-handbook folks about such a section? (attn @adswa @mih)