catalystneuro / neuroconv

Create NWB files by converting and combining neural data in proprietary formats and adding essential metadata.
https://neuroconv.readthedocs.io
BSD 3-Clause "New" or "Revised" License
51 stars 22 forks source link

[Bug]: Recommended Chunk Shape doesn't take into account compound dtypes #1122

Open pauladkisson opened 1 week ago

pauladkisson commented 1 week ago

What happened?

Dev tests are failing: https://github.com/catalystneuro/neuroconv/actions/runs/11567195585/job/32197117750

Tracked this down to an update with hdmf that uncovered these lines: https://github.com/hdmf-dev/hdmf/blob/dev/src/hdmf/backends/hdf5/h5tools.py#L1476-L1483

And our chunking recommendation based on the data shape here: https://github.com/catalystneuro/neuroconv/blob/main/src/neuroconv/tools/nwb_helpers/_configuration_models/_base_dataset_io.py#L263

Notice how in hdmf, if the data has a compound dtype, the shape is (len(data),), but in neuroconv the shape is always get_data_shape(data).

This throws an error when they mismatch in the case of a Caiman pixel_mask, which is a compound dtype in NWB.

The initial solution that I came up with was to load the NWB schema to figure out if a dataset is compound or not, but I was having some trouble finding the right code to load in the schema...

Steps to Reproduce

n/a

Traceback

No response

Operating System

Linux

Python Executable

Conda

Python Version

3.9

Package Versions

No response

Code of Conduct

pauladkisson commented 1 day ago

Update: I figured out that I can use the builder's dtype (from io.get_builder(dataset)) to appropriately check for compound dtypes, BUT io.get_builder only works when the nwbfile is being read from disk -- it returns None when the nwbfile is in memory.

pauladkisson commented 1 day ago

@rly, when you have a chance could you provide some guidance on this issue?

How can I get a builder from an in-memory nwbfile? Or, if that is too difficult, how can I get access to the schema for a given neurodata object?