NeurodataWithoutBorders / pynwb

A Python API for working with Neurodata stored in the NWB Format
https://pynwb.readthedocs.io
Other
176 stars 85 forks source link

What should be default name of `pynwb.misc.Units`? #1882

Open h-mayorquin opened 6 months ago

h-mayorquin commented 6 months ago

Right now the following throws an error:

from pynwb.misc import Units
from pynwb.testing.mock.file import mock_NWBFile

nwbfile = mock_NWBFile()
nwbfile.units = Units()

ValueError: Field 'units' on NWBFile must be named 'units'."

Units default name is "Units" as supported by best practices for naming conventions:

assert Units().name == "Units"

And is defined here: https://github.com/NeurodataWithoutBorders/pynwb/blob/2259bede338f2f202229bda0af15d7e3cea47369/src/pynwb/misc.py#L157-L158

However, the NWBFile has "units" as a required name for the attribute here on its fields:

https://github.com/NeurodataWithoutBorders/pynwb/blob/2259bede338f2f202229bda0af15d7e3cea47369/src/pynwb/file.py#L272-L273

So I think that one of the two should give? Which one makes more sense? Maybe the nwbfile.units should accept "Units" as the name? Should it be the other way around?

rly commented 6 months ago

The name of the Units object stored at the NWBFile level of the file must be "units" according to the schema: https://github.com/NeurodataWithoutBorders/nwb-schema/blob/dev/core/nwb.file.yaml#L446-L449

This predates the best practices. Changing that to "Units" at the schema level would make existing NWB 2.0-2.7 files invalid to the 2.8+ schema, which is not ideal. We could modify the APIs to set NWBFile.units from the HDF5 group at /Units instead of /units. However, that might break existing software that does not use the APIs to read the units table (Neurosift may be one). It would also mean that the name of that group in HDF5 would be the only group that starts with a capital letter -- visually unappealing, but not a big deal.

Alternatively, we could remove the fixed name "units" and modify the APIs to set NWBFile.units to the only Units object in the root group, if present, whatever it is named. That would allow for heterogeneity in what the root file looks like when >95% of use cases will need only a single units table, and would also break existing software that does not use the APIs.

Anyway, most people don't care what the object is named under the hood, but I hesitate to change the current naming scheme because of other software relying on the existing schema.

Unfortunately, the default name of the Units type is inconsistent as you have discovered. The easiest fix is to change the default name of Units to units. Or leaving the behavior as is - that way, custom Units objects follow the new best practices and the only inconvenience comes from people replacing NWBFile.units with a custom Units object that is not named "units".

h-mayorquin commented 6 months ago

Thanks for the full explanation @rly .

Unfortunately, the default name of the Units type is inconsistent as you have discovered. The easiest fix is to change the default name of Units to units.

This. I think we should change the default name for pynwb.misc.Units to be "units" so the code above works and we save the users some possible confusion. My feeling is that if someone is going to store the units table somewhere else other than nwbfile.units (e.g. a processing module) they also change the name to follow best practices. That said, I aknowledge the trade-off. There is a tension between best practices and backwards compatbility.

Independently on that I think we should change the newly added mock_units:

https://github.com/NeurodataWithoutBorders/pynwb/blob/2259bede338f2f202229bda0af15d7e3cea47369/src/pynwb/testing/mock/ecephys.py#L125-L152

So these lines work as they should: https://github.com/NeurodataWithoutBorders/pynwb/blob/2259bede338f2f202229bda0af15d7e3cea47369/src/pynwb/testing/mock/ecephys.py#L147-L150

I will do a PR for that.