NeurodataWithoutBorders / matnwb

A Matlab interface for reading and writing NWB files
BSD 2-Clause "Simplified" License
50 stars 32 forks source link

inconsistencies with cached spec #268

Closed bendichter closed 3 years ago

bendichter commented 3 years ago

A cached spec with booleans (e.g. an attribute where required=False) is written as required: "false" (false as a string instead of a bool). This writes without error but causes a build error when it is read in pynwb with load_namespaces=True. @owoolnough came across this error and I have not yet been able to reproduce the write error.

bendichter commented 3 years ago

A few more issues upon closer inspection:

Here is the spec generated by PyNWB: '{"datasets":[{"shape":[null],"dims":["vertex_index"],"doc":"a parcellation of the surface","neurodata_type_inc":"NWBData","neurodata_type_def":"Parcellation","attributes":[{"name":"labels","doc":"if the dtype of the parcellation is uint, they are categories and these are the labels","required":false,"dtype":"text","shape":[null]}]}],"groups":[{"groups":[{"doc":"parcellations for this surface","quantity":"?","neurodata_type_inc":"Parcellations"}],"datasets":[{"shape":[null,3],"dims":["face_number","vertex_index"],"dtype":"uint32","name":"faces","doc":"Faces for the surface, indexes vertices."},{"shape":[null,3],"dims":["vertex_number","xyz"],"dtype":"float32","name":"vertices","doc":"Vertices for surface, points in 3D space."}],"doc":"Group representing the faces and vertices that compose a brain cortical surface.","quantity":"?","neurodata_type_inc":"NWBDataInterface","neurodata_type_def":"Surface"},{"groups":[{"doc":"Group representing the faces and vertices that compose a brain cortical surface.","quantity":"*","neurodata_type_inc":"Surface"}],"name":"cortical_surfaces","doc":"Group that holds Surface types.","quantity":"?","neurodata_type_inc":"NWBDataInterface","neurodata_type_def":"CorticalSurfaces"},{"groups":[{"doc":"Group representing the faces and vertices that compose a brain cortical surface.","quantity":"?","neurodata_type_inc":"CorticalSurfaces"},{"doc":"Images of the brain","quantity":"?","neurodata_type_inc":"Images"}],"name":"subject","doc":"Extension of subject that holds cortical surface data.","neurodata_type_inc":"Subject","neurodata_type_def":"ECoGSubject"},{"datasets":[{"doc":"a parcellation of this surface","quantity":"+","neurodata_type_inc":"Parcellation"}],"doc":"parcellations of this surface","default_name":"parcellations","neurodata_type_inc":"NWBDataInterface","neurodata_type_def":"Parcellations"}]}'

and here is the spec generated by MatNWB: '{"datasets":[{"attributes":[{"doc":"if the dtype of the parcellation is uint, they are categories and these are the labels","dtype":"text","name":"labels","required":false,"shape":["null"]}],"dims":["vertex_index"],"doc":"a parcellation of the surface","neurodata_type_def":"Parcellation","neurodata_type_inc":"NWBData","shape":["null"]}],"groups":[{"datasets":[{"dims":["face_number","vertex_index"],"doc":"Faces for the surface, indexes vertices.","dtype":"uint32","name":"faces","shape":["null","3"]},{"dims":["vertex_number","xyz"],"doc":"Vertices for surface, points in 3D space.","dtype":"float32","name":"vertices","shape":["null","3"]}],"doc":"Group representing the faces and vertices that compose a brain cortical surface.","groups":[{"doc":"parcellations for this surface","neurodata_type_inc":"Parcellations","quantity":"?"}],"neurodata_type_def":"Surface","neurodata_type_inc":"NWBDataInterface","quantity":"?"},{"doc":"Group that holds Surface types.","groups":[{"doc":"Group representing the faces and vertices that compose a brain cortical surface.","neurodata_type_inc":"Surface","quantity":"*"}],"name":"cortical_surfaces","neurodata_type_def":"CorticalSurfaces","neurodata_type_inc":"NWBDataInterface","quantity":"?"},{"doc":"Extension of subject that holds cortical surface data.","groups":[{"doc":"Group representing the faces and vertices that compose a brain cortical surface.","neurodata_type_inc":"CorticalSurfaces","quantity":"?"},{"doc":"Images of the brain","neurodata_type_inc":"Images","quantity":"?"}],"name":"subject","neurodata_type_def":"ECoGSubject","neurodata_type_inc":"Subject"},{"datasets":[{"doc":"a parcellation of this surface","neurodata_type_inc":"Parcellation","quantity":"+"}],"default_name":"parcellations","doc":"parcellations of this surface","neurodata_type_def":"Parcellations","neurodata_type_inc":"NWBDataInterface"}]}'

bendichter commented 3 years ago

For reference, this python script fixes the read:

import h5py
import re
for filepath in filepaths:
    file = h5py.File(filepath,'r+')
    json = file['specifications/ndx_ecog/0.3.1/ndx-ecog.extensions'][()]
    if isinstance(json, bytes):
        json = json.decode()
    json = json.replace('"false"', 'false')
    json = json.replace('"null"', 'null')
    json = re.sub('"(\d+)"', r'\1', json)
    file['specifications/ndx_ecog/0.3.1/ndx-ecog.extensions'][()] = json
lawrence-mbf commented 3 years ago

Does core actually use the true/false flags anywhere in its schema?

bendichter commented 3 years ago

Yes, for specifying if an attribute is required. The Dataset spec has quantity= ['*', '+', or a number], but the Attribute spec uses required=[0 or 1]

bendichter commented 3 years ago

https://schema-language.readthedocs.io/en/latest/specification_language_description.html#id8

lawrence-mbf commented 3 years ago

Okay, just checked and it looked like it worked. #269 Can you verify?