NeurodataWithoutBorders / matnwb

A Matlab interface for reading and writing NWB files
BSD 2-Clause "Simplified" License
49 stars 32 forks source link

[Bug]: Cannot define property .. in class .. because the property has already been defined in the superclass .. #491

Closed yarikoptic closed 1 year ago

yarikoptic commented 1 year ago

What happened?

originally mentioned in https://github.com/NeurodataWithoutBorders/matnwb/issues/490#issuecomment-1422662198

in the fresh run of https://github.com/dandi/dandisets-healthstatus saw in the logs

the error ```shell Asset: sub-744912845/sub-744912845_ses-766640955.nwb Output: {^HCannot define property 'strain' in class 'types.ndx_aibs_ecephys.EcephysSpecimen' because the property has already been defined in the superclass 'types.core.Subject'. Error in io.parseGroup (line 85) parsed = eval([Type.typename '(kwargs{:})']); Error in io.parseGroup (line 38) subg = io.parseGroup(filename, group, Blacklist); Error in io.parseGroup (line 38) subg = io.parseGroup(filename, group, Blacklist); Error in nwbRead (line 59) nwb = io.parseGroup(filename, h5info(filename), Blacklist); }^H ```

which then reproduced against master on that sample file

(dandisets-2) dandi@drogon:~/cronlib/dandisets-healthstatus/code$ ./checknwb.sh 000022 sub-744912845/sub-744912845_ses-766640955.nwb master
/tmp/matnwb-TJq1ZBO
Cloning into '000022'...
remote: Enumerating objects: 8581, done.
remote: Counting objects: 100% (8581/8581), done.
remote: Compressing objects: 100% (3139/3139), done.
remote: Total 8581 (delta 4437), reused 7798 (delta 3654), pack-reused 0
Receiving objects: 100% (8581/8581), 790.29 KiB | 3.24 MiB/s, done.
Resolving deltas: 100% (4437/4437), done.
Version of the dandiset:
06ef421

  Remote origin not usable by git-annex; setting annex-ignore

  https://github.com/dandisets/000022/config download failed: Not Found
get sub-744912845/sub-744912845_ses-766640955.nwb (from web...)
ok
(recording state in git...)
Cloning into 'matnwb'...
remote: Enumerating objects: 8841, done.
remote: Counting objects: 100% (2004/2004), done.
remote: Compressing objects: 100% (811/811), done.
remote: Total 8841 (delta 1327), reused 1623 (delta 1189), pack-reused 6837
Receiving objects: 100% (8841/8841), 27.43 MiB | 7.06 MiB/s, done.
Resolving deltas: 100% (5702/5702), done.
+ for v in "$@"
+ git checkout master
Already on 'master'
Your branch is up to date with 'origin/master'.
+ echo 'Running using: '
Running using: 
+ git describe --tags
v2.6.0.0-3-g16a5411
+ matlab -nodesktop -sd /tmp/matnwb-TJq1ZBO/matnwb -batch 'generateCore()'
+ MATLABPATH=/tmp/matnwb-TJq1ZBO/matnwb:/tmp/matnwb-TJq1ZBO/matnwb/../out
+ time matlab -nodesktop -batch 'f='\''../000022/sub-744912845/sub-744912845_ses-766640955.nwb'\''; disp(util.getSchemaVersion(f)); nwb = nwbRead(f, '\''savedir'\'', '\''../out'\'')'
2.2.2
Cannot define property 'strain' in class
'types.ndx_aibs_ecephys.EcephysSpecimen' because the property has already been
defined in the superclass 'types.core.Subject'.

Error in io.parseGroup (line 85)
    parsed = eval([Type.typename '(kwargs{:})']);

Error in io.parseGroup (line 38)
    subg = io.parseGroup(filename, group, Blacklist);

Error in io.parseGroup (line 38)
    subg = io.parseGroup(filename, group, Blacklist);

Error in nwbRead (line 59)
nwb = io.parseGroup(filename, h5info(filename), Blacklist);

Command exited with non-zero status 1
17.10user 1.03system 0:17.62elapsed 102%CPU (0avgtext+0avgdata 898684maxresident)k
10752inputs+1520outputs (18major+194751minor)pagefaults 0swaps
+ echo 'Exited with 1'
Exited with 1
+ pwd
/tmp/matnwb-TJq1ZBO/matnwb
(dandisets-2) dandi@drogon:~/cronlib/dandisets-healthstatus$ grep -l 'because the property has already been' results/*/2023.02.0[2-9]*.log
results/000021/2023.02.05.11.32.02_matnwb_nwbRead_errors.log
results/000022/2023.02.05.19.19.37_matnwb_nwbRead_errors.log
(dandisets-2) dandi@drogon:~/cronlib/dandisets-healthstatus$ grep -h 'because the property has already been' results/*/2023.02.0[2-9]*.log | sort | uniq -c
      8     'types.ndx_aibs_ecephys.EcephysProbe' because the property has already been
    371     'types.ndx_aibs_ecephys.EcephysSpecimen' because the property has already been

Steps to Reproduce

see above

Error Message

No response

Operating System

Linux

Matlab Version

R2022b Update 1 (9.13.0.2080170) 64-bit (glnxa64)

Code of Conduct

lawrence-mbf commented 1 year ago

Ironically I cannot reproduce this.

I get a completely different error: ```MATLAB Error using assert Unexpected properties {unit}. Your schema version may be incompatible with the file. Consider checking the schema version of the file with `util.getSchemaVersion(filename)` and comparing with the YAML namespace version present in nwb-schema/core/nwb.namespace.yaml Error in types.util.checkUnset (line 13) assert(isempty(dropped),... Error in types.hdmf_common.VectorData (line 26) types.util.checkUnset(obj, unique(varargin(1:2:end))); Error in io.parseDataset (line 81) parsed = eval([Type.typename '(kwargs{:})']); Error in io.parseGroup (line 22) dataset = io.parseDataset(filename, datasetInfo, fullPath, Blacklist); Error in io.parseGroup (line 38) subg = io.parseGroup(filename, group, Blacklist); Error in nwbRead (line 59) nwb = io.parseGroup(filename, h5info(filename), Blacklist); ```

The error that you have may indicate another Path issue. Would it be possible to check that generateCore is simply never called before reading a NWB file? For your test case, could you also ensure that the +types module in the MatNWB installation directory only contains +types/+util and +types/+untyped?

The error I got speaks to a possible bug in the data generation.

yarikoptic commented 1 year ago

has both

edit 1 : more correct grep for the same statement

❯ grep -e '+types/+util' matnwb-TJq1ZBO-ls.txt | head -n 1
   391704       4 drwxr-xr-x   3 dandi    dandi          4096 Feb 10 17:36 ./matnwb/+types/+util
❯ grep -e '+types/+untyped' matnwb-TJq1ZBO-ls.txt | head -n 1
   391678       4 drwxr-xr-x   3 dandi    dandi          4096 Feb 10 17:36 ./matnwb/+types/+untyped
lawrence-mbf commented 1 year ago

was it using my reproducer?

No, this is also on a Windows machine anyway. To sum up what I did:

does matnwb de-reference paths and do something path related later on?

MatNWB only adds to the path using addpath at most. These are to use the YAML parsing JAR (using javaaddpath) and to use the ISO8601 external mathworks package (using addpath). These do not appear to be relevant to your issue though.

Looking at the file listing, the following directories should be deleted in ./matnwb/+types:

yarikoptic commented 1 year ago

are you suggesting to delete them manually or just stating that something in the generateCore should be fixed?

yarikoptic commented 1 year ago

FWIW, theoretically the reproducer should work even on Windows happen you install git-annex and datalad in msys2 env... will try later some time to see if there is some hidden gotchas

lawrence-mbf commented 1 year ago

are you suggesting to delete them manually...

Yes. This issue only came about because generateCore was called without the custom ./out directory. I wonder if I should just add a script that clears the namespace for the user. I'd imagine it's not comfortable for people to modify their installation directories like that.

yarikoptic commented 1 year ago

FWIW

With that in mind I have tried without savedir to the same error as you

(dandisets-2) dandi@drogon:/tmp/matnwb-TJq1ZBO/matnwb$ MATLABPATH=$PWD:$PWD/../out time matlab -nodesktop -batch "f='../000022/sub-744912845/sub-744912845_ses-766640955.nwb'; disp(util.getSchemaVersion(f)); nwb = nwbRead(f)"
2.2.2
Error using assert
Unexpected properties {unit}.

Your schema version may be incompatible with the file.  Consider checking the
schema version of the file with `util.getSchemaVersion(filename)` and comparing
with the YAML namespace version present in nwb-schema/core/nwb.namespace.yaml

Error in types.util.checkUnset (line 13)
assert(isempty(dropped),...

Error in types.hdmf_common.VectorData (line 26)
            types.util.checkUnset(obj, unique(varargin(1:2:end)));

Error in io.parseDataset (line 81)
    parsed = eval([Type.typename '(kwargs{:})']);

Error in io.parseGroup (line 22)
    dataset = io.parseDataset(filename, datasetInfo, fullPath, Blacklist);

Error in io.parseGroup (line 38)
    subg = io.parseGroup(filename, group, Blacklist);

Error in nwbRead (line 59)
nwb = io.parseGroup(filename, h5info(filename), Blacklist);

Command exited with non-zero status 1

so I guess you got different error because you called differently.

as for this error, seems like other Windows users encounter: https://github.com/NeurodataWithoutBorders/matnwb/issues/492 and I got in https://github.com/NeurodataWithoutBorders/matnwb/issues/490 .

Do you see the same error and can reproduce if you do specify savedir?

lawrence-mbf commented 1 year ago

ideally users should not call any script manually

Correct, you do not need generateCore if you're reading files. This issue only comes out of calling generateCore, then using savedir when calling nwbRead. To fix this issue requires either a script to run or to manually delete +types/+core, +types/+hdmf_*, or other extensions which are erroneously generated.

AFAIK we have specified savedir, out folder for only because were instructed to do so...

That's fine. In fact, for this specific purpose of testing multiple files with a single installation, you should continue using savedir and point to a directory that gets cleaned up after every read. This ensures that the installation directory is not polluted with extraneous or inaccurate files and properly emulates a fresh clone of a repository.

yarikoptic commented 1 year ago

Correct, you do not need generateCore if you're reading files.

well -- the documentation doesn't say that it is "optional" -- https://github.com/NeurodataWithoutBorders/matnwb#step-2-generate-the-api pretty much states to do that. (next step for extensions is listed to be optional so we do not)

That's fine. In fact, for this specific purpose of testing multiple files with a single installation, you should continue using savedir and point to a directory that gets cleaned up after every read.

ok, we can keep running generateCore + "continue using savedir" and ensure cleaning it up (saveDir) after each run. But I am still confused about the

To fix this issue requires either a script to run or to manually delete +types/+core, +types/+hdmf_*, or other extensions which are erroneously generated.

what "script to run"? shouldn't it be fixed within matnwb? Note that, as reproducer shows, we ran for that file in a completely clean fresh installation so there is no "pollution" of any kind.

lawrence-mbf commented 1 year ago

the documentation doesn't say that it is "optional"

Yeah, that should be clarified. You only need to run generateCore and generateExtension if you're writing files. For reading files it's unnecessary anyway.

ok, we can keep running generateCore + "continue using savedir"

I advise not doing that, frankly, as generateCore is what "pollutes" the namespace with NWB schema 2.6.0 which might not be what the file was written in. That's the primary cause of the issue since that's just how MATLAB path priority works.

lawrence-mbf commented 1 year ago

Sorry, I need to complete the thought and actually offer something constructive:

Just don't use generateCore if you're going to use nwbRead(..., 'savedir', ...).

yarikoptic commented 1 year ago

Sorry for being a pain, I am just trying to arrive to some logical conclusion of how to instruct users "generally" on how to use matnwb to load or write .nwb files.

Just don't use generateCore if you're going to use nwbRead(..., 'savedir', ...).

That would boil to overall distinction of two "modes" (with or without generateCore) of how matnwb's nwbRead should be installed/used (not using/using savedir), correct?

lawrence-mbf commented 1 year ago

Most users should be fine with just nwbRead without savedir which works just fine with or without generateCore (assuming a properly embedded schema).

I'd suggest only using savedir for advanced automation over a lot of NWB files with different schema versions (i.e. our test suite CI). I guess the fundamental question is "where do you want your class files to be?". generateCore and nwbRead both do that and if you're not consistent you'll run into these kinds of MATLAB path issues. If you don't care, don't use savedir.

In the case of your CI, I think my concern was about false negatives where the install directory doesn't really represent a clean git clone MatNWB installation.

yarikoptic commented 1 year ago

In the case of your CI, I think my concern was about false negatives where the install directory doesn't really represent a clean git clone MatNWB installation.

we do it clean (see here) -- remove if prior one exists, download tarball of matnwb from github , extract, do generateCore() -- everything as the README.md teaches us.

I'd suggest only using savedir for advanced automation over a lot of NWB files with different schema versions (i.e. our test suite CI).

That is exactly us - automation over a lot of NWB from different datasets, hence difference schema versions! And how then we should take advantage of savedir - should we just not do generateCore and point savedir to the same location (how does it then work for different versions?)?

yarikoptic commented 1 year ago

anyways , back to current issue, if I do not generateCore (or to the same effect just git clean -dfx in existing matnwb checkout) and do not use savedir,

I still get that error we both observed ```shell (dandisets-2) dandi@drogon:/tmp/matnwb-TJq1ZBO/matnwb$ MATLABPATH=$PWD time matlab -nodesktop -batch "f='../000022/sub-744912845/sub-744912845_ses-766640955.nwb'; disp(util.getSchemaVersion(f)); nwb = nwbRead(f)" 2.2.2 Error using assert Unexpected properties {unit}. Your schema version may be incompatible with the file. Consider checking the schema version of the file with `util.getSchemaVersion(filename)` and comparing with the YAML namespace version present in nwb-schema/core/nwb.namespace.yaml Error in types.util.checkUnset (line 13) assert(isempty(dropped),... Error in types.hdmf_common.VectorData (line 26) types.util.checkUnset(obj, unique(varargin(1:2:end))); Error in io.parseDataset (line 81) parsed = eval([Type.typename '(kwargs{:})']); Error in io.parseGroup (line 22) dataset = io.parseDataset(filename, datasetInfo, fullPath, Blacklist); Error in io.parseGroup (line 38) subg = io.parseGroup(filename, group, Blacklist); Error in nwbRead (line 59) nwb = io.parseGroup(filename, h5info(filename), Blacklist); Command exited with non-zero status 1 19.63user 1.09system 0:19.44elapsed 106%CPU (0avgtext+0avgdata 915580maxresident)k 10800inputs+1480outputs (12major+198358minor)pagefaults 0swaps ```

so the file has 2.2.2 and there seems to be 2.2.2 available within matnwb:

(dandisets-2) dandi@drogon:/tmp/matnwb-TJq1ZBO/matnwb$ ls nwb-schema/
2.0.2  2.1.0  2.2.0  2.2.1  2.2.2  2.2.3  2.2.4  2.2.5  2.3.0  2.4.0  2.5.0  2.6.0

also IIRC there should be a copy of schema within nwb file ... how do we open/load such a file?

lawrence-mbf commented 1 year ago

so the file has 2.2.2 and there seems to be 2.2.2 available within matnwb:

This is not relevant in case of nwbRead because:

also IIRC there should be a copy of schema within nwb file ... how do we open/load such a file?

nwbRead actually internally calls the equivalent of generateCore pointing to the embedded schema and all extensions. This is why these pathing issues can easily occur if you're using savedir.

Looking at the file again, I have to apologize because I thought I had read this file successfully before though I had not. In fact, this error is tied this old issue: #43 where an anonymous VectorData defined by the Units table adds fields. Still thinking about fixes there but a cheap workaround is possible to allow for ignoring this class of error possibly.

yarikoptic commented 1 year ago

if you think it is at large duplicate of #43 -- feel free to close. I've subscribed to #43 as well for now.

so the file has 2.2.2 and there seems to be 2.2.2 available within matnwb:

This is not relevant in case of nwbRead because:

also IIRC there should be a copy of schema within nwb file ... how do we open/load such a file?

nwbRead actually internally calls the equivalent of generateCore pointing to the embedded schema and all extensions.

then error message suggesting some incompatibility ("Your schema version may be incompatible with the file.") is even more confusing, and may be this message should be made more specific/adequate to different cases of problems.

lawrence-mbf commented 1 year ago

Issue now duplicate of #238