con / nwb2bids

Reorganize NWB files into a BIDS directory layout.
1 stars 1 forks source link

Test on some bigger DANDI datasets using datalad-fuse #6

Open yarikoptic opened 4 months ago

yarikoptic commented 4 months ago

Ultimately and eventually this tool should work on any DANDI dataset with nwb files. But for initial start we can concentrate on those which have (only?) extracellular (ecephys) data.

@bendichter could recommend some specific ones. Meanwhile could give a start point to try on.

yarikoptic commented 4 months ago


datalad install -r -R 1
datalad fuse-mount dandisets /tmp/dandisets-fuse

and then you have access to dandisets-fuse/

example of use -- so you could pick the code there:

TheChymera commented 4 months ago

The first command fails with:

[deco]/mnt/data ❱ datalad install -r -R 1
Cloning: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.00/2.00 [00:00<00:00, 8.51 candidates/s]Username for '': TheChymera
Password for '':
install(error): /mnt/data/dandisets (dataset) [Failed to clone from any candidate source URL. Encountered errors per each url were:
  CommandError: 'git -c diff.ignoreSubmodules=none -c core.quotepath=false clone --progress /mnt/data/dandisets' failed with exitcode 128 [err: 'Cloning into '/mnt/data/dandisets'...
remote: Not Found
fatal: repository '' not found']
  CommandError: 'git -c diff.ignoreSubmodules=none -c core.quotepath=false clone --progress /mnt/data/dandisets' failed with exitcode 128 [err: 'Cloning into '/mnt/data/dandisets'...
remote: Repository not found.
fatal: repository '' not found']]
[ERROR  ] NoDatasetFound(No installed dataset found at /mnt/data/dandisets) (NoDatasetFound)
usage: datalad install [-h] [-s URL-OR-PATH] [-d DATASET] [-g] [-D DESCRIPTION] [-r] [-R LEVELS] [--reckless [auto|ephemeral|shared-...]] [-J NJOBS] [--branch BRANCH] [--version] [URL-OR-PATH ...]

However, this worked: datalad install -r -R 1

Following that, I think the fuse-mount extension wasn't properly installed:

(mydev) [deco]/mnt/data/datalad ❱ datalad fuse-mount 000628/ /tmp/000628
datalad: Unknown command 'fuse-mount'.  See 'datalad --help'.

(mydev) [deco]/mnt/data/datalad ❱ datalad --help | rg fuse -C 5
      Aggregate metadata of one or more datasets for later query

*DataLad FUSE command suite*

      FUSE File system providing transparent access to files under DataLad
      Show leading lines/bytes of an annexed file by fetching its data from a
      Clear fsspec cache

Any idea how I can check inside the python console whether it's installed? Tried installing it both via the package manager and via PIP, and neither of them seem to work.

better move that part of the discussion to issue tracker as you suggested →

yarikoptic commented 4 months ago

use datalad install -r -R 1 -- that is the superdataset for all dandisets, and then datalad fusefs -d dandisets dandisets-fuse or alike (check datalad fusefs --help)

TheChymera commented 4 months ago

oh, ok, got the first one as well now.

TheChymera commented 4 months ago

@yarikoptic even if I use the ssh clone URI, I get prompts asking me for my GitHub password. Even if I enter it, they fail. I assume these are embargoed datasets? In any case, is there any way to skip them?

datalad install -r -R 1
TheChymera commented 4 months ago

Here's an example:

[INFO   ] Remote origin not usable by git-annex; setting annex-ignore
[INFO   ] download failed: Not Found
[INFO   ] access to 2 dataset siblings dandi-dandisets-dropbox, dandiapi not auto-enabled, enable with:
|       datalad siblings -d "/mnt/data/datalad/dandisets/000222" enable -s SIBLING
[INFO   ] Remote origin not usable by git-annex; setting annex-ignore
[INFO   ] download failed: Not Found
[INFO   ] access to 2 dataset siblings dandiapi, dandi-dandisets-dropbox not auto-enabled, enable with:
|       datalad siblings -d "/mnt/data/datalad/dandisets/000223" enable -s SIBLING
Installing:  25%|████████████████████▋                                                              | 155/621 [06:09<10:37, 1.37s/ datasetsUsername for '': TheChymera                                                           | 0.00/3.00 [00:00<?, ? candidates/s]
Password for '':
  [146 similar messages have been suppressed; disable with datalad.ui.suppress-similar-results=off]
install(error): /mnt/data/datalad/dandisets/000224 (dataset) [Failed to clone from any candidate source URL. Encountered errors per each url were:
  CommandError: 'git -c diff.ignoreSubmodules=none -c core.quotepath=false clone --progress /mnt/data/datalad/dandisets/000224' failed with exitcode 128 [err: 'Cloning into '/mnt/data/datalad/dandisets/000224'...
remote: Support for password authentication was removed on August 13, 2021.
remote: Please see for information on currently recommended modes of authentication.
fatal: Authentication failed for ''']
  CommandError: 'git -c diff.ignoreSubmodules=none -c core.quotepath=false clone --progress /mnt/data/datalad/dandisets/000224' failed with exitcode 128 [err: 'Cloning into '/mnt/data/datalad/dandisets/000224'...
remote: Not Found
fatal: repository '' not found']
  CommandError: 'git -c diff.ignoreSubmodules=none -c core.quotepath=false clone --progress /mnt/data/datalad/dandisets/000224' failed with exitcode 128 [err: 'Cloning into '/mnt/data/datalad/dandisets/000224'...
fatal: remote error:
 dandi/dandisets.git/000224 is not a valid repository name
Visit for help
CommandError: 'ssh -o ControlPath=/home/chymera/.cache/datalad/sockets/7b668231 -o SendEnv=GIT_PROTOCOL 'git-upload-pack '"'"'dandi/dandisets.git/000224'"'"''' failed with exitcode 1']]
yarikoptic commented 4 months ago

just datalad install -r -R 1

those subdatasets which fail to clone - just ignore, must be private since enbargoed

TheChymera commented 4 months ago

@yarikoptic even with the https link it still waits on every dataset I can't download. Is there any auto-skip feature? I looked in the help, nothing stood out 🤔