NeurodataWithoutBorders / pynwb

A Python API for working with Neurodata stored in the NWB Format
https://pynwb.readthedocs.io
Other
174 stars 85 forks source link

[Bug]: PYNWB_VALIDATION when using ndx-miniscope extension #1777

Open weiglszonja opened 10 months ago

weiglszonja commented 10 months ago

What happened?

When running nwbinspector on a file that uses ndx-miniscope to add Miniscope device, it flags a PYNWB_VALIDATION for the ImagingPlane as:

1  PYNWB_VALIDATION
===================

1.1  /Users/weian/catalystneuro/demo_notebooks/C6-J588-Disc5.nwb: ImagingPlane - 'None' object at location 'general/optophysiology/ImagingPlane'
       Message: missing data type Device (device)

However the device looks correctly linked to ImagingPlane:

Screenshot 2023-09-27 at 13 21 34

Operating System

M1 macOS

Python Version

3.9

Were you streaming with ROS3?

No

Package Versions

nwbinspector 0.4.30 ndx-miniscope 0.5.1

Code of Conduct

weiglszonja commented 10 months ago

We just discussed with @bendichter that this is more likely a pynwb validation error, I'll open an issue there.

CodyCBakerPhD commented 10 months ago

@weiglszonja I just went ahead and transferred it

CodyCBakerPhD commented 10 months ago

The ndx-miniscope extension defines a custom Device here: https://github.com/catalystneuro/ndx-miniscope/blob/main/spec/ndx-miniscope.extensions.yaml#L2-L4

io.write works fine, even io.read; but it seems pynwb.validate does not like the subtype of Device linked to the ImagingPlane?

@weiglszonja Can you share the full traceback of pynwb.validate and figure a way to share the file easily with @rly or @oruebel?

oruebel commented 10 months ago

When you do io.read what does nwbfile.get_imaging_plane('ImagingPlane').device return? I'm just wondering, because the validator seems to indicate None.

weiglszonja commented 10 months ago

@oruebel @CodyCBakerPhD Thank you for helping figuring out this issue,

nwbfile.get_imaging_plane("ImagingPlane").device

returns

Miniscope abc.Miniscope at 0x5828172864
Fields:
  compression: FFV1
  deviceType: Miniscope_V3
  frameRate: 15FPS
  framesPerFile: 1000
  gain: High
  led0: 47

I just tried this on the file:

from pynwb import validate

nwbfile_path = "C6-J588-Disc5.nwb"
print(validate(paths=[nwbfile_path], verbose=True))

returns

Validating /Users/weian/catalystneuro/demo_notebooks/C6-J588-Disc5.nwb against cached namespace information using namespace 'ndx-miniscope'.
([], 0)

So the file passed the validation then? But running nwbinspector again:

import nwbinspector

nwbfile_path = "C6-J588-Disc5.nwb"
print(list(nwbinspector.inspect_nwbfile(nwbfile_path))[0])

returns

InspectorMessage(
    message='missing data type Device (device)',
    importance=<Importance.PYNWB_VALIDATION: 3>,
    severity=<Severity.LOW: 1>,
    check_function_name='ImagingPlane',
    object_type=None,
    object_name=None,
    location='general/optophysiology/ImagingPlane',
    file_path='/Users/weian/catalystneuro/demo_notebooks/C6-J588-Disc5.nwb'
)
CodyCBakerPhD commented 10 months ago

from pynwb import validate

nwbfile_path = "C6-J588-Disc5.nwb" print(validate(paths=[nwbfile_path], verbose=True))

The call pattern in the Inspector is actually slightly different; https://github.com/NeurodataWithoutBorders/nwbinspector/blob/dev/src/nwbinspector/nwbinspector.py#L555

Can you try again using the classical way of doing the validation?

with NWBHDF5IO(...) as io:
    pynwb.validate(io=io)
weiglszonja commented 10 months ago

you're right @CodyCBakerPhD !

from pynwb import validate
from pynwb import NWBHDF5IO
nwbfile_path = "C6-J588-Disc5.nwb"

with NWBHDF5IO(nwbfile_path, load_namespaces=True) as io:
    print(validate(io=io, verbose=True))

returns

[ImagingPlane (general/optophysiology/ImagingPlane): missing data type Device (device)]
rly commented 10 months ago

@weiglszonja could you please upload the file to https://drive.google.com/drive/u/0/folders/197eVWGgqSItz5DethQdEjlyw_-iCyRSY if you can, so that I can troubleshoot? Thanks.

Strange that

from pynwb import validate

nwbfile_path = "C6-J588-Disc5.nwb"
print(validate(paths=[nwbfile_path], verbose=True))

and

from pynwb import validate
from pynwb import NWBHDF5IO
nwbfile_path = "C6-J588-Disc5.nwb"

with NWBHDF5IO(nwbfile_path, load_namespaces=True) as io:
    print(validate(io=io, verbose=True))

have different results. I suspect that one is not loading the namespaces correctly.

weiglszonja commented 10 months ago

Thank you @rly , I uploaded a small file.

rly commented 10 months ago

Code snippet 1) above results in no validation errors - the correct behavior - because validate by default will validate against all cached namespaces and ignore dependent namespaces. In this case, the only namespace that will be validated against is ndx-miniscope.

Code snippet 2) above results in a validation error because it only validates against a namespace arg if provided or the core namespace otherwise. In this case, it will validate against the core namespace, and that validation appears not to understand that data type Miniscope extends data type Device. Adding a namespace arg -- print(validate(io=io, verbose=True, namespace="ndx-miniscope")) -- results in no validation errors like in case 1.

This raises a few questions.

  1. What should the default behavior be for validate when the io arg is passed? I think it should be changed to the same default behavior as when the paths arg is passed where the IO object is validated against all cached namespaces except for dependent ones. It looks like when we last updated pynwb.validate, we decided to keep the existing behavior of validate with io passed to maintain backward compatibility with previous versions of pynwb. https://github.com/NeurodataWithoutBorders/pynwb/pull/1511#issuecomment-1253877922 . The new split behavior is confusing and I think it would be OK to change this behavior.

  2. Why does validation on the core namespace not understand the type of the Miniscope device? "ndx-miniscope" was loaded. It looks like in ValidatorMap, only the types in the given namespace are processed. However, this will result in problems when there are two extensions A and B that independently extend the core namespace. Validation of the file on namespace A will not know about data types in B and vice versa.

  3. Is there any use case for not simply validating on all the loaded namespaces together? Perhaps both ValidatorMap and pynwb.validate should be updated such that all the loaded namespaces are validated together and you cannot specify a single namespace out of all the ones loaded to validate against. @oruebel @CodyCBakerPhD What do you think? This would be a major breaking change, but I think that is OK.

weiglszonja commented 7 months ago

@rly do you have a solution how to fix this? This validation error is triggered for any file that is using ndx-miniscope, so this very much needs a fix. Is there something I can do in ndx-miniscope?

CodyCBakerPhD commented 7 months ago

@rly @oruebel Any recommendations for this issue?

rly commented 7 months ago

@weiglszonja @CodyCBakerPhD I created https://github.com/NeurodataWithoutBorders/nwbinspector/pull/425 which should address the issue quickly.

I think to address the issue in PyNWB without making the change above, we would have to make a breaking change to the behavior of pynwb.validate(io=...). (One could argue that the current behavior is a bug, but this behavior has been around for so long and was kept the last time we made changes to the validator, that I am concerned that some downstream code relies on it working this way).

I am in full support of making this change and created https://github.com/NeurodataWithoutBorders/pynwb/issues/1807 for it. To do this right, I would also want to make a change in the core validation code in HDMF to validate against all extension namespaces. That change may take a little time to implement, so, in the meantime, I recommend changing the code in nwbinspector to use validate(paths=...) instead of validate(io=...) as is done in my PR.

Would that resolve the issue at hand?

CodyCBakerPhD commented 7 months ago

@rly Sounds great! Thanks for summarizing so clearly