Open pauladkisson opened 3 weeks ago
Update:
I figured out that I can use the builder's dtype (from io.get_builder(dataset)
) to appropriately check for compound dtypes, BUT io.get_builder only works when the nwbfile is being read from disk -- it returns None when the nwbfile is in memory.
@rly, when you have a chance could you provide some guidance on this issue?
How can I get a builder from an in-memory nwbfile? Or, if that is too difficult, how can I get access to the schema for a given neurodata object?
@pauladkisson I do not totally follow what happened here. If I understand correctly, get_data_shape
changed in HDMF and this broke something in neuroconv. What was the behavior of get_data_shape
for compound data types before the change? Is this a bug in HDMF?
An in-memory (not yet written) NWB file has no builders until you call write
. You can use HDMF internal functions to get the spec if you really need to:
import pynwb
manager = pynwb.get_manager()
manager.type_map.namespace_catalog.get_spec("core", "PlaneSegmentation").get_dataset("pixel_mask")
If I understand correctly, get_data_shape changed in HDMF and this broke something in neuroconv.
Well, it isn't get_data_shape
per se but rather this block of code:
# define the data shape
if 'shape' in io_settings:
data_shape = io_settings.pop('shape')
elif hasattr(data, 'shape'):
data_shape = data.shape
elif isinstance(dtype, np.dtype) and len(dtype) > 1: # check if compound dtype
data_shape = (len(data),)
else:
data_shape = get_data_shape(data)
And I'm not sure how exactly, but some recent changes uncovered this block of code so that it runs (__list_fill__
) instead of __scalar_fill__
. I had tracked that PR down previously, but didn't write it down unfortunately.
Is this a bug in HDMF?
I don't think so.
Another approach that may be more robust and which does not require knowledge of the spec, but does require knowledge of the pynwb object attributes:
import pynwb
from pynwb.testing.mock.ophys import mock_PlaneSegmentation
manager = pynwb.get_manager()
ps = mock_PlaneSegmentation()
manager.type_map.get_map(ps).get_attr_spec("pixel_mask")
What happened?
Dev tests are failing: https://github.com/catalystneuro/neuroconv/actions/runs/11567195585/job/32197117750
Tracked this down to an update with hdmf that uncovered these lines: https://github.com/hdmf-dev/hdmf/blob/dev/src/hdmf/backends/hdf5/h5tools.py#L1427-L1435
And our chunking recommendation based on the data shape here: https://github.com/catalystneuro/neuroconv/blob/main/src/neuroconv/tools/nwb_helpers/_configuration_models/_base_dataset_io.py#L263
Notice how in hdmf, if the data has a compound dtype, the shape is
(len(data),)
, but in neuroconv the shape is alwaysget_data_shape(data)
.This throws an error when they mismatch in the case of a Caiman pixel_mask, which is a compound dtype in NWB.
The initial solution that I came up with was to load the NWB schema to figure out if a dataset is compound or not, but I was having some trouble finding the right code to load in the schema...
Steps to Reproduce
Traceback
No response
Operating System
Linux
Python Executable
Conda
Python Version
3.9
Package Versions
No response
Code of Conduct