NeurodataWithoutBorders / pynwb

A Python API for working with Neurodata stored in the NWB Format
https://pynwb.readthedocs.io
Other
177 stars 84 forks source link

[Feature]: if a file passes pynwb.validate, it should be readable #1468

Open CodyCBakerPhD opened 2 years ago

CodyCBakerPhD commented 2 years ago

What would you like to see added to PyNWB?

Hey all,

This came up originally in https://github.com/dandi/helpdesk/discussions/71 that the NWBInspector failed to run on a certain NWBFile. Turns out that NWBFile had some problems (leading to https://github.com/NeurodataWithoutBorders/helpdesk/discussions/28), but on our side of things the source failure point came down to the call to io.read after completing validate(io) successfully.

While we are extending the error catching on the NWBInspector side (https://github.com/NeurodataWithoutBorders/nwbinspector/pull/204), we think it would also be great to fix this on the actual validation level as well.

Is your feature request related to a problem?

I have an NWBFile made with MatNWB that passes pynwb.validate but fails on io.read().

What solution would you like?

I would like the following traceback to be safely returned as a 'validation_error' or some analogous type during the initial validate(io) call.

Sorry I can't really provide a minimal example for this, since the file has to be made via MatNWB in a particular way in order for such a file to exist.

Traceback (most recent call last):

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\build\objectmapper.py", line 1244, in construct
    obj = self.__new_container__(cls, builder.source, parent, builder.attributes.get(self.__spec.id_key()),

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\build\objectmapper.py", line 1254, in __new_container__
    obj.__init__(**kwargs)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\utils.py", line 582, in func_call
    pargs = _check_args(args, kwargs)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\utils.py", line 575, in _check_args
    raise ExceptionType(msg)

TypeError: DynamicTableRegion.__init__: missing argument 'description'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

  File "C:\Users\Raven\AppData\Local\Temp\ipykernel_9176\3758307767.py", line 2, in <cell line: 1>
    io.read()

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\backends\hdf5\h5tools.py", line 498, in read
    return call_docval_func(super().read, kwargs)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\utils.py", line 424, in call_docval_func
    return func(*fargs, **fkwargs)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\utils.py", line 583, in func_call
    return func(args[0], **pargs)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\backends\io.py", line 41, in read
    container = self.__manager.construct(f_builder)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\utils.py", line 583, in func_call
    return func(args[0], **pargs)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\build\manager.py", line 280, in construct
    result = self.__type_map.construct(builder, self, None)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\utils.py", line 583, in func_call
    return func(args[0], **pargs)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\build\manager.py", line 794, in construct
    return obj_mapper.construct(builder, build_manager, parent)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\utils.py", line 583, in func_call
    return func(args[0], **pargs)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\build\objectmapper.py", line 1214, in construct
    subspecs = self.__get_subspec_values(builder, self.spec, manager)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\build\objectmapper.py", line 1143, in __get_subspec_values
    self.__get_sub_builders(groups, spec.groups, manager, ret)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\build\objectmapper.py", line 1194, in __get_sub_builders
    ret.update(self.__get_subspec_values(sub_builder, subspec, manager))

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\build\objectmapper.py", line 1143, in __get_subspec_values
    self.__get_sub_builders(groups, spec.groups, manager, ret)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\build\objectmapper.py", line 1186, in __get_sub_builders
    sub_builder = self.__flatten(sub_builder, subspec, manager)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\build\objectmapper.py", line 1199, in __flatten
    tmp = [manager.construct(b) for b in sub_builder]

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\build\objectmapper.py", line 1199, in <listcomp>
    tmp = [manager.construct(b) for b in sub_builder]

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\utils.py", line 583, in func_call
    return func(args[0], **pargs)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\build\manager.py", line 276, in construct
    result = self.__type_map.construct(builder, self, parent)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\utils.py", line 583, in func_call
    return func(args[0], **pargs)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\build\manager.py", line 794, in construct
    return obj_mapper.construct(builder, build_manager, parent)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\utils.py", line 583, in func_call
    return func(args[0], **pargs)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\build\objectmapper.py", line 1214, in construct
    subspecs = self.__get_subspec_values(builder, self.spec, manager)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\build\objectmapper.py", line 1143, in __get_subspec_values
    self.__get_sub_builders(groups, spec.groups, manager, ret)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\build\objectmapper.py", line 1186, in __get_sub_builders
    sub_builder = self.__flatten(sub_builder, subspec, manager)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\build\objectmapper.py", line 1199, in __flatten
    tmp = [manager.construct(b) for b in sub_builder]

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\build\objectmapper.py", line 1199, in <listcomp>
    tmp = [manager.construct(b) for b in sub_builder]

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\utils.py", line 583, in func_call
    return func(args[0], **pargs)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\build\manager.py", line 276, in construct
    result = self.__type_map.construct(builder, self, parent)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\utils.py", line 583, in func_call
    return func(args[0], **pargs)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\build\manager.py", line 794, in construct
    return obj_mapper.construct(builder, build_manager, parent)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\utils.py", line 583, in func_call
    return func(args[0], **pargs)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\build\objectmapper.py", line 1214, in construct
    subspecs = self.__get_subspec_values(builder, self.spec, manager)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\build\objectmapper.py", line 1143, in __get_subspec_values
    self.__get_sub_builders(groups, spec.groups, manager, ret)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\build\objectmapper.py", line 1186, in __get_sub_builders
    sub_builder = self.__flatten(sub_builder, subspec, manager)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\build\objectmapper.py", line 1199, in __flatten
    tmp = [manager.construct(b) for b in sub_builder]

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\build\objectmapper.py", line 1199, in <listcomp>
    tmp = [manager.construct(b) for b in sub_builder]

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\utils.py", line 583, in func_call
    return func(args[0], **pargs)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\build\manager.py", line 276, in construct
    result = self.__type_map.construct(builder, self, parent)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\utils.py", line 583, in func_call
    return func(args[0], **pargs)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\build\manager.py", line 794, in construct
    return obj_mapper.construct(builder, build_manager, parent)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\utils.py", line 583, in func_call
    return func(args[0], **pargs)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\build\objectmapper.py", line 1214, in construct
    subspecs = self.__get_subspec_values(builder, self.spec, manager)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\build\objectmapper.py", line 1144, in __get_subspec_values
    self.__get_sub_builders(datasets, spec.datasets, manager, ret)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\build\objectmapper.py", line 1196, in __get_sub_builders
    ret[subspec] = manager.construct(sub_builder)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\utils.py", line 583, in func_call
    return func(args[0], **pargs)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\build\manager.py", line 276, in construct
    result = self.__type_map.construct(builder, self, parent)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\utils.py", line 583, in func_call
    return func(args[0], **pargs)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\build\manager.py", line 794, in construct
    return obj_mapper.construct(builder, build_manager, parent)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\utils.py", line 583, in func_call
    return func(args[0], **pargs)

  File "D:\anaconda3\envs\nwbct_3_9_1\lib\site-packages\hdmf\build\objectmapper.py", line 1248, in construct
    raise ConstructError(builder, msg) from ex

ConstructError: (root/processing/ophys/Fluorescence/RoiResponseSeries/rois DatasetBuilder {'attributes': {'namespace': 'hdmf-common', 'neurodata_type': 'DynamicTableRegion', 'object_id': '0614843a-cc4a-4bf5-ba7f-828f58cf3ea2'}, 'data': <Closed HDF5 dataset>}, "Could not construct DynamicTableRegion object due to: DynamicTableRegion.__init__: missing argument 'description'")

Do you have any interest in helping implement the feature?

No.

Code of Conduct

oruebel commented 2 years ago

@CodyCBakerPhD the traceback doesn't seem to include the call to validate itself so just to clarify. You are calling:

https://github.com/NeurodataWithoutBorders/pynwb/blob/2973aeb5bd72c2cda85ac87eddbabed7d2da54e8/src/pynwb/__init__.py#L189-L198

  1. If understand your request correctly, then you would like to see the following call wrapped in a try/except block and have it return a validation_error instead of raising an exception that the file is unreadable, correct?

https://github.com/NeurodataWithoutBorders/pynwb/blob/2973aeb5bd72c2cda85ac87eddbabed7d2da54e8/src/pynwb/__init__.py#L196

  1. Also, would the following line also need to be wrapped in a try/except:

https://github.com/NeurodataWithoutBorders/pynwb/blob/2973aeb5bd72c2cda85ac87eddbabed7d2da54e8/src/pynwb/__init__.py#L198

  1. In what cases should pynwb.validate still raise an exception? E.g., if an external link is unreadable, should that be a validation error or an I/O exception.
CodyCBakerPhD commented 2 years ago

@oruebel Oh, sorry - I've modified the post for clarity surrounding our understanding of the issue.

Upon double checking, pynwb.validate(io) works fine and returns empty. But io.read() fails with the traceback. I would like validate to inform me if the file is readable or not.

If understand your request correctly, then you would like to see the following call wrapped in a try/except block and have it return a validation_error instead of raising an exception that the file is unreadable, correct?

Essentially, but as this relates to the further specified issue, it ought to be something related to builders/constructors I suppose? I'm not really sure.

In what cases should pynwb.validate still raise an exception? E.g., if an external link is unreadable, should that be a validation error or an I/O exception.

Not sure. My view would be to capture everything and turn it into the 'validation' type of error message that gets returned (and only an error-free file would return an empty output). But there are issues that could be Python errors instead of NWB errors, too.

oruebel commented 2 years ago

Upon double checking, pynwb.validate(io) works fine and returns empty. But io.read() fails

Thanks for the clarification. I see, pynwb.validate preforms validation based on the builders and does not go through the process of constructing the containers. It's strange though that the missing description does not result in a validation error as well. Is the description for that DynamicTable actually missing in the file or is there something else going on?

To catch this in validate, I think what this means is that we would need to add reading the file to the end of the validate function after validator.validate has completed, something along the lines of:

try:
    io.read()
except as e:
   # add validation error that indicated that the file is unreadable and include traceback
return validation_errors

Should this be a separate return value, to indicate whether the file is readable or just an additional validation error?

CodyCBakerPhD commented 2 years ago

Should this be a separate return value, to indicate whether the file is readable or just an additional validation error?

I think 'just an additional validation error' would be sufficient

oruebel commented 2 years ago

@CodyCBakerPhD can you provide an example of how to test the behavior you are seeing, i.e., how does the file look that passes validation but fails the build process? The main reason I am asking is: a) I would like to be able to have a unittest for this, and b) I'm curious because this seems to indicate that there may be something important that we don't catch during validation.

CodyCBakerPhD commented 2 years ago

@oruebel OK after a lot of investigating and playing around, here is the 'most minimal' reproducible code I could make that reproduces the exact issue without needing an external file

Three-step process. Step 1 is Ophys tutorial down to RoiResponseSeries level and everything it needs along the way. Step 2 is h5py modification to break schema in the way the original problem file did. Step 3 is verification of no output of pynwb.validate but unable to io.read()

from datetime import datetime
from dateutil.tz import tzlocal

import numpy as np
import pynwb
from h5py import File
from pynwb import NWBFile, NWBHDF5IO
from pynwb.ophys import OpticalChannel, ImageSegmentation, RoiResponseSeries, Fluorescence

nwbfile = NWBFile(
    "my first synthetic recording",
    "EXAMPLE_ID",
    datetime.now(tzlocal()),
    experimenter="Dr. Bilbo Baggins",
    lab="Bag End Laboratory",
    institution="University of Middle Earth at the Shire",
    experiment_description="I went on an adventure with thirteen dwarves " "to reclaim vast treasures.",
    session_id="LONELYMTN",
)

device = nwbfile.create_device(
    name="Microscope", description="My two-photon microscope", manufacturer="The best microscope manufacturer"
)
optical_channel = OpticalChannel(name="OpticalChannel", description="an optical channel", emission_lambda=500.0)
imaging_plane = nwbfile.create_imaging_plane(
    name="ImagingPlane",
    optical_channel=optical_channel,
    imaging_rate=30.0,
    description="a very interesting part of the brain",
    device=device,
    excitation_lambda=600.0,
    indicator="GFP",
    location="V1",
    grid_spacing=[0.01, 0.01],
    grid_spacing_unit="meters",
    origin_coords=[1.0, 2.0, 3.0],
    origin_coords_unit="meters",
)

ophys_module = nwbfile.create_processing_module(name="ophys", description="optical physiology processed data")

img_seg = ImageSegmentation()

ps = img_seg.create_plane_segmentation(
    name="PlaneSegmentation",
    description="output from segmenting my favorite imaging plane",
    imaging_plane=imaging_plane,
)

ophys_module.add(img_seg)

for _ in range(30):
    image_mask = np.zeros((100, 100))

    # randomly generate example image masks
    x = np.random.randint(0, 95)
    y = np.random.randint(0, 95)
    image_mask[x : x + 5, y : y + 5] = 1

    # add image mask to plane segmentation
    ps.add_roi(image_mask=image_mask)

rt_region = ps.create_roi_table_region(region=[0, 1], description="the first of two ROIs")

roi_resp_series = RoiResponseSeries(
    name="RoiResponseSeries", data=np.ones((50, 2)), rois=rt_region, unit="lumens", rate=30.0  # 50 samples, 2 ROIs
)

fl = Fluorescence(roi_response_series=roi_resp_series)
ophys_module.add(fl)

path = "E:/test_inspector/dandi_helpdesk/71/test_reproduce.nwb"
with NWBHDF5IO(path=path, mode="w") as io:
    io.write(nwbfile)

with File(name=path, mode="r+") as file:
    del file["processing"]["ophys"]["Fluorescence"]["RoiResponseSeries"]["rois"].attrs["description"]
    del file["processing"]["ophys"]["Fluorescence"]["RoiResponseSeries"]["rois"].attrs["table"]

with NWBHDF5IO(path=path, mode="r") as io:
    validate_messages = list(pynwb.validate(io=io))
    if not validate_messages:
        io.read()  # this triggers the exact error
oruebel commented 2 years ago

@CodyCBakerPhD thanks for the helpful example. Looking at the schema and the modifications you make in the script, it looks like the required attributes for description and table are missing from the DynamicTableRegion. Based on this, I think we would see the same issue any time a named attribute is missing.

Rather than an error with the read, this seems to indicate that there is some sort of bug or error with the validator, in that it either fails to check for (some) attributes or assumes that attributes are optional when they should be treated as required. I will create a separate issue for this. I think it is valuable to do both, i.e., 1) check that a file can be read as part of validation, if only as a fallback in case the validator misses something or there are bugs that cause inconsitencies between the API and validation, and 2) make sure we address this particular issue in the validator to make sure it correctly raises a validation error for this case.