NeurodataWithoutBorders / matnwb

A Matlab interface for reading and writing NWB files
BSD 2-Clause "Simplified" License
50 stars 32 forks source link

failing validation of matnwb-generated files with extension in cached namespace #351

Open cechava opened 2 years ago

cechava commented 2 years ago

pynwb validation of matnwb-generated files fails when an extension is in the cached namespace. This is and independent issue from issue #349 and is not fixed by PR #350

To replicate:

generateExtension('/Users/cesar/Repositories/ndx-ecog/spec/ndx-ecog.namespace.yaml');

nwb = NwbFile( ...
    'session_description', 'mouse in open exploration', ...
    'identifier', 'Mouse5_Day3', ...
    'session_start_time', datetime(2018, 4, 25, 2, 30, 3) ...
);

cortical_surfaces = types.ndx_ecog.CorticalSurfaces;

parcellations = types.ndx_ecog.Parcellations();
parcellation = types.ndx_ecog.Parcellation( ... 
    'data', randi(5, 1, 20), ... 
    'labels', {'a','b', 'c', 'd', 'e'} ...
);
parcellations.parcellation.set('my_map', parcellation);

nwbExport(nwb, 'ndx-ecog_file_test_matnwb.nwb');
$ python -m pynwb.validate ndx-ecog_file_test_matnwb.nwb 
Validating ndx-ecog_file_test_matnwb.nwb against cached namespace information using namespace 'ndx-ecog'.
Traceback (most recent call last):
  File "/Users/cesar/anaconda3/envs/pynwb-dev/lib/python3.9/site-packages/hdmf/validate/validator.py", line 229, in get_validator
    return self.__validators[dt]
KeyError: 'NWBFile'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/cesar/anaconda3/envs/pynwb-dev/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/cesar/anaconda3/envs/pynwb-dev/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/cesar/anaconda3/envs/pynwb-dev/lib/python3.9/site-packages/pynwb/validate.py", line 136, in <module>
    main()
  File "/Users/cesar/anaconda3/envs/pynwb-dev/lib/python3.9/site-packages/pynwb/validate.py", line 130, in main
    ret = ret or _validate_helper(io=io, namespace=ns)
  File "/Users/cesar/anaconda3/envs/pynwb-dev/lib/python3.9/site-packages/pynwb/validate.py", line 23, in _validate_helper
    errors = validate(**kwargs)
  File "/Users/cesar/anaconda3/envs/pynwb-dev/lib/python3.9/site-packages/hdmf/utils.py", line 587, in func_call
    return func(**pargs)
  File "/Users/cesar/anaconda3/envs/pynwb-dev/lib/python3.9/site-packages/pynwb/__init__.py", line 198, in validate
    return validator.validate(builder)
  File "/Users/cesar/anaconda3/envs/pynwb-dev/lib/python3.9/site-packages/hdmf/utils.py", line 583, in func_call
    return func(args[0], **pargs)
  File "/Users/cesar/anaconda3/envs/pynwb-dev/lib/python3.9/site-packages/hdmf/validate/validator.py", line 247, in validate
    validator = self.get_validator(dt)
  File "/Users/cesar/anaconda3/envs/pynwb-dev/lib/python3.9/site-packages/hdmf/utils.py", line 583, in func_call
    return func(args[0], **pargs)
  File "/Users/cesar/anaconda3/envs/pynwb-dev/lib/python3.9/site-packages/hdmf/validate/validator.py", line 232, in get_validator
    raise ValueError(msg)
ValueError: data type 'NWBFile' not found in namespace ndx-ecog

No issue when cached namespace is not used:

$ python -m pynwb.validate ndx-ecog_file_test_matnwb.nwb --no-cached-namespace
Validating ndx-ecog_file_test_matnwb.nwb against pynwb namespace information using namespace 'core'.
 - no errors found.
cechava commented 2 years ago

Doing some printing. It looks like pynwb is trying to use key 'NWBFile' to access an entry in a dictionary that has the relevant subset of classes as keys (e.g., Data, NWBContainer, Subject). This is an incorrect behavior. When successfully validating a file without an extension in the cached namespace, the 'NWBFile' keys is used to a access an entry in a dictionary with all core and hdmf-common classes as keys (e.g., NWBFile, Subject, DynamicTable)

cechava commented 2 years ago

Issue seems to be in how matnwb caches the namespace of the extension. Compare how the namespace cached by matnwb:

import h5py
import json
from pprint import pprint
f = h5py.File('ndx-ecog_file_test_matnwb.nwb', "r")
pprint(json.loads(f["/specifications/ndx-ecog/0.3.1/namespace"][()]))

{'namespaces': [{'author': ['Ben Dichter'],
                 'contact': ['ben.dichter@catalystneuro.com'],
                 'doc': 'An NWB extension for storing the cortical surface of '
                        'subjects in ECoG experiments.',
                 'name': 'ndx-ecog',
                 'schema': [{'namespace': 'core',
                             'neurodata_types': ['NWBDataInterface',
                                                 'Subject',
                                                 'Images',
                                                 'NWBData']},
                            {'source': 'ndx-ecog.extensions'}],
                 'version': '0.3.1'}]}

with how the namespace is cached by pynwb

f = h5py.File('ndx-ecog_file_test_pynwb.nwb', "r")
pprint(json.loads(f["/specifications/ndx-ecog/0.3.1/namespace"][()]))

{'namespaces': [{'author': ['Ben Dichter'],
                 'contact': ['ben.dichter@catalystneuro.com'],
                 'doc': 'An NWB extension for storing the cortical surface of '
                        'subjects in ECoG experiments.',
                 'name': 'ndx-ecog',
                 'schema': [{'namespace': 'core'},
                            {'source': 'ndx-ecog.extensions'}],
                 'version': '0.3.1'}]}

Note that pynwb does not cache the neurodata_types key in schema. Modifying matnwb caching to exclude this key resolves the issue. Opening a PR with this change.

oruebel commented 2 years ago

@rly can you clarify why caching the neurodata_types key is a problem and why PyNWB does not cache it?

rly commented 2 years ago

This may be a bug in the PyNWB/HDMF validator. I thought this was resolved in https://github.com/hdmf-dev/hdmf/pull/613 . @cechava can you share that test file that you created?

rly commented 2 years ago

Separately, PyNWB/HDMF should cache the neurodata_types key. That is a bug.

cechava commented 2 years ago

Sure. The code to generate both files is below. They both make sure of the ndx-ecog extension

python

from pynwb import load_namespaces, get_class, NWBHDF5IO, NWBFile
from datetime import datetime
import numpy as np
from ndx_ecog import CorticalSurfaces, ECoGSubject, Surface, Parcellations
nwbfile = NWBFile('my first synthetic recording', 'EXAMPLE_ID', datetime.now())
cortex_module = nwbfile.create_processing_module(name='cortex',
                                                 description='description')
with NWBHDF5IO('ndx-ecog_file_test_pynwb.nwb', 'w') as io:
    io.write(nwbfile)

MATLAB

generateExtension('/Users/cesar/Repositories/ndx-ecog/spec/ndx-ecog.namespace.yaml');

nwb = NwbFile( ...
    'session_description', 'mouse in open exploration', ...
    'identifier', 'Mouse5_Day3', ...
    'session_start_time', datetime(2018, 4, 25, 2, 30, 3) ...
);

cortical_surfaces = types.ndx_ecog.CorticalSurfaces;

parcellations = types.ndx_ecog.Parcellations();
parcellation = types.ndx_ecog.Parcellation( ... 
    'data', randi(5, 1, 20), ... 
    'labels', {'a','b', 'c', 'd', 'e'} ...
);
parcellations.parcellation.set('my_map', parcellation);

nwbExport(nwb, 'ndx-ecog_file_test_matnwb.nwb');
cechava commented 2 years ago

@rly Just realized you were asking for the actual files. To make life easy here are links to download: matnwb-generated file pynwb-generated file