Open CodyCBakerPhD opened 2 years ago
@CodyCBakerPhD the traceback doesn't seem to include the call to validate itself so just to clarify. You are calling:
pynwb.validate
still raise an exception? E.g., if an external link is unreadable, should that be a validation error or an I/O exception.@oruebel Oh, sorry - I've modified the post for clarity surrounding our understanding of the issue.
Upon double checking, pynwb.validate(io)
works fine and returns empty. But io.read()
fails with the traceback. I would like validate
to inform me if the file is readable or not.
If understand your request correctly, then you would like to see the following call wrapped in a try/except block and have it return a validation_error instead of raising an exception that the file is unreadable, correct?
Essentially, but as this relates to the further specified issue, it ought to be something related to builders/constructors I suppose? I'm not really sure.
In what cases should pynwb.validate still raise an exception? E.g., if an external link is unreadable, should that be a validation error or an I/O exception.
Not sure. My view would be to capture everything and turn it into the 'validation' type of error message that gets returned (and only an error-free file would return an empty output). But there are issues that could be Python errors instead of NWB errors, too.
Upon double checking,
pynwb.validate(io)
works fine and returns empty. Butio.read()
fails
Thanks for the clarification. I see, pynwb.validate
preforms validation based on the builders and does not go through the process of constructing the containers. It's strange though that the missing description does not result in a validation error as well. Is the description for that DynamicTable actually missing in the file or is there something else going on?
To catch this in validate, I think what this means is that we would need to add reading the file to the end of the validate function after validator.validate has completed, something along the lines of:
try:
io.read()
except as e:
# add validation error that indicated that the file is unreadable and include traceback
return validation_errors
Should this be a separate return value, to indicate whether the file is readable or just an additional validation error?
Should this be a separate return value, to indicate whether the file is readable or just an additional validation error?
I think 'just an additional validation error' would be sufficient
@CodyCBakerPhD can you provide an example of how to test the behavior you are seeing, i.e., how does the file look that passes validation but fails the build process? The main reason I am asking is: a) I would like to be able to have a unittest for this, and b) I'm curious because this seems to indicate that there may be something important that we don't catch during validation.
@oruebel OK after a lot of investigating and playing around, here is the 'most minimal' reproducible code I could make that reproduces the exact issue without needing an external file
Three-step process. Step 1 is Ophys tutorial down to RoiResponseSeries
level and everything it needs along the way. Step 2 is h5py
modification to break schema in the way the original problem file did. Step 3 is verification of no output of pynwb.validate
but unable to io.read()
from datetime import datetime
from dateutil.tz import tzlocal
import numpy as np
import pynwb
from h5py import File
from pynwb import NWBFile, NWBHDF5IO
from pynwb.ophys import OpticalChannel, ImageSegmentation, RoiResponseSeries, Fluorescence
nwbfile = NWBFile(
"my first synthetic recording",
"EXAMPLE_ID",
datetime.now(tzlocal()),
experimenter="Dr. Bilbo Baggins",
lab="Bag End Laboratory",
institution="University of Middle Earth at the Shire",
experiment_description="I went on an adventure with thirteen dwarves " "to reclaim vast treasures.",
session_id="LONELYMTN",
)
device = nwbfile.create_device(
name="Microscope", description="My two-photon microscope", manufacturer="The best microscope manufacturer"
)
optical_channel = OpticalChannel(name="OpticalChannel", description="an optical channel", emission_lambda=500.0)
imaging_plane = nwbfile.create_imaging_plane(
name="ImagingPlane",
optical_channel=optical_channel,
imaging_rate=30.0,
description="a very interesting part of the brain",
device=device,
excitation_lambda=600.0,
indicator="GFP",
location="V1",
grid_spacing=[0.01, 0.01],
grid_spacing_unit="meters",
origin_coords=[1.0, 2.0, 3.0],
origin_coords_unit="meters",
)
ophys_module = nwbfile.create_processing_module(name="ophys", description="optical physiology processed data")
img_seg = ImageSegmentation()
ps = img_seg.create_plane_segmentation(
name="PlaneSegmentation",
description="output from segmenting my favorite imaging plane",
imaging_plane=imaging_plane,
)
ophys_module.add(img_seg)
for _ in range(30):
image_mask = np.zeros((100, 100))
# randomly generate example image masks
x = np.random.randint(0, 95)
y = np.random.randint(0, 95)
image_mask[x : x + 5, y : y + 5] = 1
# add image mask to plane segmentation
ps.add_roi(image_mask=image_mask)
rt_region = ps.create_roi_table_region(region=[0, 1], description="the first of two ROIs")
roi_resp_series = RoiResponseSeries(
name="RoiResponseSeries", data=np.ones((50, 2)), rois=rt_region, unit="lumens", rate=30.0 # 50 samples, 2 ROIs
)
fl = Fluorescence(roi_response_series=roi_resp_series)
ophys_module.add(fl)
path = "E:/test_inspector/dandi_helpdesk/71/test_reproduce.nwb"
with NWBHDF5IO(path=path, mode="w") as io:
io.write(nwbfile)
with File(name=path, mode="r+") as file:
del file["processing"]["ophys"]["Fluorescence"]["RoiResponseSeries"]["rois"].attrs["description"]
del file["processing"]["ophys"]["Fluorescence"]["RoiResponseSeries"]["rois"].attrs["table"]
with NWBHDF5IO(path=path, mode="r") as io:
validate_messages = list(pynwb.validate(io=io))
if not validate_messages:
io.read() # this triggers the exact error
@CodyCBakerPhD thanks for the helpful example. Looking at the schema and the modifications you make in the script, it looks like the required attributes for description
and table
are missing from the DynamicTableRegion
. Based on this, I think we would see the same issue any time a named attribute is missing.
Rather than an error with the read, this seems to indicate that there is some sort of bug or error with the validator, in that it either fails to check for (some) attributes or assumes that attributes are optional when they should be treated as required. I will create a separate issue for this. I think it is valuable to do both, i.e., 1) check that a file can be read as part of validation, if only as a fallback in case the validator misses something or there are bugs that cause inconsitencies between the API and validation, and 2) make sure we address this particular issue in the validator to make sure it correctly raises a validation error for this case.
What would you like to see added to PyNWB?
Hey all,
This came up originally in https://github.com/dandi/helpdesk/discussions/71 that the NWBInspector failed to run on a certain NWBFile. Turns out that NWBFile had some problems (leading to https://github.com/NeurodataWithoutBorders/helpdesk/discussions/28), but on our side of things the source failure point came down to the call to
io.read
after completingvalidate(io)
successfully.While we are extending the error catching on the NWBInspector side (https://github.com/NeurodataWithoutBorders/nwbinspector/pull/204), we think it would also be great to fix this on the actual validation level as well.
Is your feature request related to a problem?
I have an NWBFile made with MatNWB that passes
pynwb.validate
but fails onio.read()
.What solution would you like?
I would like the following traceback to be safely returned as a 'validation_error' or some analogous type during the initial
validate(io)
call.Sorry I can't really provide a minimal example for this, since the file has to be made via MatNWB in a particular way in order for such a file to exist.
Do you have any interest in helping implement the feature?
No.
Code of Conduct