Open yarikoptic opened 3 years ago
You can also just tell the validation routine to not use the cached spec and the validation error can be reproduced:
$ python -m pynwb.validate --no-cached-namespace 496035.05.02.nwb
Validating 496035.05.02.nwb against pynwb namespace information using namespace core.
- found the following errors:
VectorIndex/description (general/intracellular_ephys/sweep_table/series_index): argument missing
VectorIndex (general/intracellular_ephys/sweep_table/series_index): incorrect type - expected 'uint8', got 'int32'
VectorIndex/description (general/intracellular_ephys/sweep_table/series_index): argument missing
VectorIndex (general/intracellular_ephys/sweep_table/series_index): incorrect type - expected 'uint8', got 'int32'
(base)
thomas@Win10-Thomas-PC MSYS /e
$ python -m pynwb.validate --cached-namespace 496035.05.02.nwb
Validating 496035.05.02.nwb against cached namespace information using namespace ndx-mies.
- no errors found.
(base)
The series_index validation issue was introduced in https://github.com/hdmf-dev/hdmf-common-schema/commit/2851b46ae0025ceac9a0b35e933649cce90ab40f. Changing from a signed to unsigned integer is strictly speaking a breaking change.
I would say that the files MIES creates are still valid as they use a cached schema. But of course I can be persuaded otherwise.
Another odd thing is that the old type (int32) can hold all values of uint8 so even when validating against the stock pynwb schema this should not give a warning. AFAIR we allow bigger types instead of smaller ones.
@yarikoptic
The validation error
Failed to read metadata: Could not construct IntracellularElectrode object due to: IntracellularElectrode.__init__: incorrect type for 'description' (got 'bytes', expected 'str')
is from dandi or?
Can you explain what is wrong here?
The HDF5 file type of general/intracellular_ephys/electrode_0/description
is text
according to the schema 1.
In the HDF5 file I'm storing that as
DATATYPE H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLPAD;
CSET H5T_CSET_UTF8;
CTYPE H5T_C_S1;
}
which is supported in all the other places where text
is set as type.
Sample file: HardwareTests-V2-MIES-187d9143.zip
@yarikoptic I'm doing validation using pynwb with subprocess, see https://github.com/AllenInstitute/MIES/blob/bcb8f4ff185394729f99f5b31b3dfb9f18acbd59/tools/nwb-read-tests/nwbv2-read-test.py#L22. This at least avoids that pynwb loads the stock schema. The code might be brittle as I'm only using that in a docker container without real world users.
@yarikoptic @rly How do we proceed here? Should the fix be in pynwb or in dandi?
@t-b @yarikoptic To summarize, if I understand correctly, the file is valid against the schema that is cached, but the schema was updated and the file is not valid against the latest schema. dandi validate
uses the pynwb.validate(...)
function which loads the schema from the active PyNWB installation, which may or may not be the latest schema or the one cached by the file. python -m pynwb.validate
validates differently and provides more options than the pynwb.validate(...)
function, including allowing validation against the cached schema (default).
In my opinion, DANDI should set a requirement for the minimum NWB version for validation, and it should validate against the cached schema if the version is >= the minimum requirement. It should not validate against the latest schema because older files are not guaranteed to be valid against the latest schema.
After upload of a given dandiset, DANDI should verify that all files in the dandiset use the same schema version of NWB and then display that version on the dandiset page so that users know whether the files can be read by tools that have a min version requirement.
Separately, PyNWB should include an option in the pynwb.validate(...)
function where validation is done against the cached schema as opposed to the schema from the PyNWB installation. For the time being, DANDI should use the command-line interface python -m pynwb.validate
to validate a file against the cached schema. If that is unworkable or inconvenient, I can provide sample code to run before pynwb.validate(...)
to get it to validate against the cached schema.
Failed to read metadata: Could not construct IntracellularElectrode object due to: IntracellularElectrode.__init__: incorrect type for 'description' (got 'bytes', expected 'str')
This looks like an issue with compatibility with h5py 3.0 where strings are read as bytes instead of str. Which version of h5py was installed?
This looks like an issue with compatibility with h5py 3.0 where strings are read as bytes instead of str. Which version of h5py was installed?
Yes this was a h5py issue indeed, going back to 2.10.0 solved it.
@t-b asked and provided a sample file on dandi slack channel, which seems to validate "fine" if
but errors are returned if
and that is the ones we consider to be errors and report by `dandi validate`
```shell $> dandi validate 496035.05.02.nwb 496035.05.02.nwb: 5 error(s) VectorIndex/description (general/intracellular_ephys/sweep_table/series_index): argument missing VectorIndex (general/intracellular_ephys/sweep_table/series_index): incorrect type - expected 'uint8', got 'int32' VectorIndex/description (general/intracellular_ephys/sweep_table/series_index): argument missing VectorIndex (general/intracellular_ephys/sweep_table/series_index): incorrect type - expected 'uint8', got 'int32' Required field 'subject_id' has no value Summary: Validation errors in 1 out of 1 files ``` the last one is "dandi specific" since we do require subject_id value.so why is such a difference and either we should somehow become more lenient in dandi?
edit 1: FWIW the same with load_namespaces=True
```shell $> python3 -c 'import pynwb; print(pynwb.validate(pynwb.NWBHDF5IO("496035.05.02.nwb", "r", load_namespaces=True)))' /home/yoh/proj/dandi/dandi-cli/venvs/dev3/lib/python3.8/site-packages/hdmf/spec/namespace.py:470: UserWarning: ignoring namespace 'hdmf-common' because it already exists warn("ignoring namespace '%s' because it already exists" % ns['name']) /home/yoh/proj/dandi/dandi-cli/venvs/dev3/lib/python3.8/site-packages/hdmf/spec/namespace.py:470: UserWarning: ignoring namespace 'core' because it already exists warn("ignoring namespace '%s' because it already exists" % ns['name']) [VectorIndex/description (general/intracellular_ephys/sweep_table/series_index): argument missing, VectorIndex (general/intracellular_ephys/sweep_table/series_index): incorrect type - expected 'uint8', got 'int32', VectorIndex/description (general/intracellular_ephys/sweep_table/series_index): argument missing, VectorIndex (general/intracellular_ephys/sweep_table/series_index): incorrect type - expected 'uint8', got 'int32'] ```