NeurodataWithoutBorders / matnwb

A Matlab interface for reading and writing NWB files
BSD 2-Clause "Simplified" License
49 stars 32 forks source link

[Bug]: MatNWB fails to read certain DANDIsets #481

Closed yarikoptic closed 1 year ago

yarikoptic commented 1 year ago

What happened?

While preparing healthchecking of dandisets, we ran into matnwb jobs never completing (https://github.com/dandi/dandisets-healthstatus/issues/8), and if ran directly, errorring out with Unable to resolve the name 'types.ndx_dandi_icephys.DandiIcephysMetadata'. (reported by @jwodder ) when ran on e.g. sub-mouse-AWOAY/sub-mouse-AWOAY_ses-20190829-sample-1_slice-20190829-slice-1_cell-20190829-sample-1_icephys.nwb from DANDI 000008 .

Steps to Reproduce

here were complete invocations while from FUSE'd dandisets

dandi     732837 93.4  0.5 6286668 383544 pts/15 Sl+  Dec05 9481:36           /mnt/backup/apps/MATLAB/R2022b/bin/glnxa64/MATLAB -batch nwb = nwbRead('/mnt/backup/dandi/dandisets-healthstatus/dandisets-fuse/000008/sub-mouse-AWOAY/sub-mouse-AWOAY_ses-20190829-sample-1_slice-20190829-slice-1_cell-20190829-sample-1_icephys.nwb', 'savedir', '/mnt/fast/dandi/dandisets-healthstatus') -nodesktop -prefersoftwareopengl
dandi     832307 93.4  0.5 6293812 337156 pts/15 Sl+  Dec05 9480:59           /mnt/backup/apps/MATLAB/R2022b/bin/glnxa64/MATLAB -batch nwb = nwbRead('/mnt/backup/dandi/dandisets-healthstatus/dandisets-fuse/000008/sub-mouse-BVPYH/sub-mouse-BVPYH_ses-20181121-sample-6_slice-20181121-slice-3_cell-20181121-sample-6_icephys.nwb', 'savedir', '/mnt/fast/dandi/dandisets-healthstatus') -nodesktop -prefersoftwareopengl


### Error Message

_No response_

### Operating System

Linux

### Matlab Version

R2022b

### Code of Conduct

- [X] I agree to follow this project's [Code of Conduct](https://github.com/neurodatawithoutborders/matnwb/blob/master/.github/CODE_OF_CONDUCT.md)
- [X] Have you ensured this bug was not already [reported](https://github.com/NeurodataWithoutBorders/matnwb/issues)?
lawrence-mbf commented 1 year ago

Testing both files appear to work on the current master branch. Perhaps a recent fix resolved this issue?

lawrence-mbf commented 1 year ago

I will state though, that AWOAY_ses-20190829-sample-1 did take a little bit longer to read though no error was thrown there either.

yarikoptic commented 1 year ago

FTR, we finally have finished a complete run through all dandisets to check what we can or not open: from https://github.com/dandi/dandisets-healthstatus/blob/main/000008/status.yaml#L2671

overall with that matnwb we managed to read 103/7087 datasets/files, and failed to read 86/17816 datasets/files. (so unfortunately more failed than succeeded in terms of number of files). See https://github.com/dandi/dandisets-healthstatus#readme for more details. It would be great to see how well current master version works whenever it gets released (code we use to install matnwb here)

lawrence-mbf commented 1 year ago

Hi @yarikoptic is there anything that would block using the tip of master for MatNWB?

yarikoptic commented 1 year ago

looking at results of the most recent rerun on 000008 https://github.com/dandi/dandisets-healthstatus/blob/main/results/000008/status.yaml#L1337 it seems we did have that recent release matnwb: v2.6.0.0 but got all files fail as before, e.g.

Asset: sub-mouse-AAYYT/sub-mouse-AAYYT_ses-20180420-sample-2_slice-20180420-slice-2_cell-20180420-sample-2_icephys.nwb
Output:
    Unable to resolve the name 'types.ndx_dandi_icephys.DandiIcephysMetadata'.

    Error in io.parseGroup (line 85)
        parsed = eval([Type.typename '(kwargs{:})']);

    Error in io.parseGroup (line 38)
        subg = io.parseGroup(filename, group, Blacklist);

    Error in io.parseGroup (line 38)
        subg = io.parseGroup(filename, group, Blacklist);

    Error in nwbRead (line 59)
    nwb = io.parseGroup(filename, h5info(filename), Blacklist);

so - decided to reproduce "directly". Wrote this scriptie to test against multiple versions of matnwb on a file from dandi (used git-annex although could just download directly from the archive I guess):

#!/bin/bash

set -eu

cd "$(mktemp -d ${TMPDIR:-/tmp}/matnwb-XXXXXXX)"
pwd

ds="$1"
f="$2"

shift 
shift

git clone https://github.com/dandisets/$ds
( cd $ds; 
  echo "Version of the dandiset: ";
  git describe --always
  git annex get "$f"
)

mkdir out

git clone https://github.com/NeurodataWithoutBorders/matnwb
cd matnwb

set -x
for v in "$@"; do 
    git checkout $v; 
    echo "Running using: "; git describe --tags; 
    matlab -nodesktop -sd $PWD -batch 'generateCore()'; 
    MATLABPATH=$PWD time matlab -nodesktop -batch "nwb = nwbRead('../$ds/$f', 'savedir', '../out')" || echo "Exited with $?"
done

pwd
and it resulted in failures if ran on released or master version of matnwb
(base) dandi@drogon:~/proj$ ./checknwb.sh 000008 sub-mouse-ZZVQT/sub-mouse-ZZVQT_ses-20181003-sample-9_slice-20181003-slice-9_cell-20181003-sample-9_icephys.nwb v2.6.0.0 master
/tmp/matnwb-0gIBFpU
Cloning into '000008'...
remote: Enumerating objects: 92335, done.
remote: Counting objects: 100% (47585/47585), done.
remote: Compressing objects: 100% (26044/26044), done.
remote: Total 92335 (delta 16734), reused 41732 (delta 16130), pack-reused 44750
Receiving objects: 100% (92335/92335), 12.28 MiB | 21.17 MiB/s, done.
Resolving deltas: 100% (31598/31598), done.
Version of the dandiset: 
0.211014.0809-7-gcbbc79d2b

  Remote origin not usable by git-annex; setting annex-ignore

  https://github.com/dandisets/000008/config download failed: Not Found
get sub-mouse-ZZVQT/sub-mouse-ZZVQT_ses-20181003-sample-9_slice-20181003-slice-9_cell-20181003-sample-9_icephys.nwb (from web...) 
(scanning for annexed files...)      
ok
(recording state in git...)
Cloning into 'matnwb'...
remote: Enumerating objects: 8841, done.
remote: Counting objects: 100% (2004/2004), done.
remote: Compressing objects: 100% (811/811), done.
remote: Total 8841 (delta 1327), reused 1623 (delta 1189), pack-reused 6837
Receiving objects: 100% (8841/8841), 27.43 MiB | 22.00 MiB/s, done.
Resolving deltas: 100% (5702/5702), done.
+ for v in "$@"
+ git checkout v2.6.0.0
Note: switching to 'v2.6.0.0'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 4bb8ef2 Merge pull request #482 from NeurodataWithoutBorders/schema-260-release
+ echo 'Running using: '
Running using: 
+ git describe --tags
v2.6.0.0
+ matlab -nodesktop -sd /tmp/matnwb-0gIBFpU/matnwb -batch 'generateCore()'
+ MATLABPATH=/tmp/matnwb-0gIBFpU/matnwb
+ time matlab -nodesktop -batch 'nwb = nwbRead('\''../000008/sub-mouse-ZZVQT/sub-mouse-ZZVQT_ses-20181003-sample-9_slice-20181003-slice-9_cell-20181003-sample-9_icephys.nwb'\'', '\''savedir'\'', '\''../out'\'')'
Unable to resolve the name 'types.ndx_dandi_icephys.DandiIcephysMetadata'.

Error in io.parseGroup (line 85)
    parsed = eval([Type.typename '(kwargs{:})']);

Error in io.parseGroup (line 38)
    subg = io.parseGroup(filename, group, Blacklist);

Error in io.parseGroup (line 38)
    subg = io.parseGroup(filename, group, Blacklist);

Error in nwbRead (line 59)
nwb = io.parseGroup(filename, h5info(filename), Blacklist);

Command exited with non-zero status 1
23.69user 1.27system 0:21.11elapsed 118%CPU (0avgtext+0avgdata 1128812maxresident)k
392inputs+1808outputs (18major+249145minor)pagefaults 0swaps
+ echo 'Exited with 1'
Exited with 1
+ for v in "$@"
+ git checkout master
Previous HEAD position was 4bb8ef2 Merge pull request #482 from NeurodataWithoutBorders/schema-260-release
Switched to branch 'master'
Your branch is up to date with 'origin/master'.
+ echo 'Running using: '
Running using: 
+ git describe --tags
v2.6.0.0-3-g16a5411
+ matlab -nodesktop -sd /tmp/matnwb-0gIBFpU/matnwb -batch 'generateCore()'
+ MATLABPATH=/tmp/matnwb-0gIBFpU/matnwb
+ time matlab -nodesktop -batch 'nwb = nwbRead('\''../000008/sub-mouse-ZZVQT/sub-mouse-ZZVQT_ses-20181003-sample-9_slice-20181003-slice-9_cell-20181003-sample-9_icephys.nwb'\'', '\''savedir'\'', '\''../out'\'')'
Unable to resolve the name 'types.ndx_dandi_icephys.DandiIcephysMetadata'.

Error in io.parseGroup (line 85)
    parsed = eval([Type.typename '(kwargs{:})']);

Error in io.parseGroup (line 38)
    subg = io.parseGroup(filename, group, Blacklist);

Error in io.parseGroup (line 38)
    subg = io.parseGroup(filename, group, Blacklist);

Error in nwbRead (line 59)
nwb = io.parseGroup(filename, h5info(filename), Blacklist);

Command exited with non-zero status 1
23.28user 1.18system 0:20.83elapsed 117%CPU (0avgtext+0avgdata 1111008maxresident)k
4040inputs+1784outputs (37major+246652minor)pagefaults 0swaps
+ echo 'Exited with 1'
Exited with 1
+ pwd
/tmp/matnwb-0gIBFpU/matnwb

please advise since original issue seems to remain, at least for this randomly selected file from 000008

lawrence-mbf commented 1 year ago

The read works for me but I am doing so without the custom savedir. I'm guessing since we're using ../out as the class directory, it should also be on the MATLAB path.

yarikoptic commented 1 year ago

confirming that fixing up the script (now at https://github.com/dandi/dandisets-healthstatus/blob/main/code/checknwb.sh)

results in clean run for both released and not version, looking like ```shell + matlab -nodesktop -sd /tmp/matnwb-KhiKdQa/matnwb -batch 'generateCore()' + MATLABPATH=/tmp/matnwb-KhiKdQa/matnwb:/tmp/matnwb-KhiKdQa/matnwb/../out + time matlab -nodesktop -batch 'nwb = nwbRead('\''../000008/sub-mouse-ZZVQT/sub-mouse-ZZVQT_ses-20181003-sample-9_slice-20181003-slice-9_cell-20181003-sample-9_icephys.nwb'\'', '\''savedir'\'', '\''../out'\'')' nwb = NwbFile with properties: nwb_version: '2.2.5' file_create_date: [1x1 types.untyped.DataStub] general_source_script_file_name: [] identifier: 'cb09e8b3-d26b-4fb9-b402-f1c9371f0030' session_description: 'Current clamp square' session_start_time: 2018-10-03T00:00:00.000000-04:00 timestamps_reference_time: 2018-10-03T00:00:00.000000-04:00 acquisition: [80x1 types.untyped.Set] analysis: [0x1 types.untyped.Set] general: [1x1 types.untyped.Set] general_data_collection: [] general_devices: [1x1 types.untyped.Set] general_experiment_description: 'Current clamp square' general_experimenter: [1x1 types.untyped.DataStub] general_extracellular_ephys: [0x1 types.untyped.Set] general_extracellular_ephys_electrodes: [] general_institution: 'Baylor College of Medicine' general_intracellular_ephys: [1x1 types.untyped.Set] general_intracellular_ephys_experimental_conditions: [] general_intracellular_ephys_filtering: [] general_intracellular_ephys_intracellular_recordings: [] general_intracellular_ephys_repetitions: [] general_intracellular_ephys_sequential_recordings: [] general_intracellular_ephys_simultaneous_recordings: [] general_intracellular_ephys_sweep_table: [1x1 types.core.SweepTable] general_keywords: [] general_lab: 'Tolias' general_notes: [] general_optogenetics: [0x1 types.untyped.Set] general_optophysiology: [0x1 types.untyped.Set] general_pharmacology: [] general_protocol: [] general_related_publications: [] general_session_id: '20181003_sample_9' general_slices: [] general_source_script: [] general_stimulus: [] general_subject: [1x1 types.ndx_dandi_icephys.CreSubject] general_surgery: [] general_virus: [] intervals: [0x1 types.untyped.Set] intervals_epochs: [] intervals_invalid_times: [] intervals_trials: [] processing: [0x1 types.untyped.Set] scratch: [0x1 types.untyped.Set] stimulus_presentation: [79x1 types.untyped.Set] stimulus_templates: [0x1 types.untyped.Set] units: [] 20.11user 1.06system 0:19.25elapsed 109%CPU (0avgtext+0avgdata 941552maxresident)k 152inputs+1488outputs (9major+200975minor)pagefaults 0swaps ```

so let's consider issue for now addressed, we will fix up our side, and do full pass to give more detailed report (if anything fails) later on in a follow up issue. Thank you @lawrence-mbf for support!