NeurodataWithoutBorders / pynwb

A Python API for working with Neurodata stored in the NWB Format
https://pynwb.readthedocs.io
Other
176 stars 85 forks source link

IntervalSeries.data.conversion is stored as 'NaN' #592

Closed bendichter closed 6 years ago

bendichter commented 6 years ago

IntervalSeries.conversion returns conversion = 1.0 before and after read:

from pynwb import NWBFile, NWBHDF5IO
from pynwb.misc import IntervalSeries

interval_series = IntervalSeries(data=[-1, 1], timestamps=[1, 2], name='test_interval_series', source='source')

print(interval_series.conversion)

nwbfile = NWBFile("source", "a file with header data", "NB123A", '2018-06-01T00:00:00')
nwbfile.add_acquisition(interval_series)

with NWBHDF5IO('test_interval_series.nwb', 'w') as io:
    io.write(nwbfile)

with NWBHDF5IO('test_interval_series.nwb', 'r') as io:
    nwbfile_in = io.read(nwbfile)
    print(nwbfile_in.acquisition['test_interval_series'].conversion)
1.0
1.0

However the data is actually stored as 'NaN' (the string 'NaN', not a NaN float value):

import h5py
with h5py.File('test_interval_series.nwb', 'r') as file:
    print(file['acquisition']['test_interval_series']['data'].attrs['conversion'])
NaN

This is causing issues for matnwb, which expects this value to follow the schema, which defines it to be a float. Is there a reason why it's stored this way?

bendichter commented 6 years ago

There's also a discrepancy between the schema and pynwb.

The pynwb API accepts float or str for conversion: https://pynwb.readthedocs.io/en/latest/pynwb.base.html#pynwb.base.TimeSeries

However the NWB schema only accepts float: https://github.com/NeurodataWithoutBorders/pynwb/blob/631f6db9cf7fa1eda951094108c435bae7b2027b/src/pynwb/data/nwb.base.yaml#L113-L114

JesseLivezey commented 6 years ago

@bendichter @ajtritt I tried to trace the IntervalSeries code to find where the NaN might get converted to a string, but couldn't figure out where attributes like conversion get written. I'm somewhat motivated to fix this since Ben suggested I use this for the dataset I'm converting to NWB. If either of you can give me some suggested places to look, I can try and find the problem and make a PR with a fix/test.

I did notice here that conversion might have NaN as a default value in this IntervalSeries yaml entry. https://github.com/NeurodataWithoutBorders/pynwb/blob/f01f5e672c56c62e1d77212163962716f17be362/src/pynwb/data/nwb.misc.yaml#L104 but I'm not familiar enough with the library to know whether this is relevant.

bendichter commented 6 years ago

@JesseLivezey Yes I think that's helpful! Probably that NaN is being interpreted as a string.

JesseLivezey commented 6 years ago

Although it looks like this value is being used correctly up until some point https://github.com/NeurodataWithoutBorders/pynwb/blob/c3a1bed686ff291bb0e523573b9b04ea073dcfc7/src/pynwb/base.py#L11 any idea where these type of attributes get written to the h5 file?

JesseLivezey commented 6 years ago

It's 'NaN' already by this line. https://github.com/NeurodataWithoutBorders/pynwb/blob/c3a1bed686ff291bb0e523573b9b04ea073dcfc7/src/pynwb/form/backends/hdf5/h5tools.py#L366

JesseLivezey commented 6 years ago

Something happens here https://github.com/NeurodataWithoutBorders/pynwb/blob/c3a1bed686ff291bb0e523573b9b04ea073dcfc7/src/pynwb/form/backends/io.py#L39

Both conversion and resolution become 'NaN' when they are both 1.0 in the preceding line.

bendichter commented 6 years ago

tracked to here: https://github.com/NeurodataWithoutBorders/pynwb/blob/c3a1bed686ff291bb0e523573b9b04ea073dcfc7/src/pynwb/form/build/map.py#L152

JesseLivezey commented 6 years ago

@bendichter @ajtritt I think there are two problems

1) One problem is that there is no dtype check/conversion for non-string/text dtypes in this function https://github.com/NeurodataWithoutBorders/pynwb/blob/c3a1bed686ff291bb0e523573b9b04ea073dcfc7/src/pynwb/form/build/map.py#L552 This is why 'NaN' is getting saved as a string and not a float.

2) A second problem is how default values are supposed to work. For instance, you can't pass conversion to the IntervalSeries constructor, but it picks up this default value (1.0) for a while https://github.com/NeurodataWithoutBorders/pynwb/blob/c3a1bed686ff291bb0e523573b9b04ea073dcfc7/src/pynwb/base.py#L11

Then, when the file is written, the value is determined by a combination of values from the yaml spec and values in the container object with the following logic:

value = spec.value
if value is None:
    value = container.value
    if value is None:
        value = spec.default_value

which in this case means a float value of 'NaN' is saved (checking with direct hdf5 access).

Then, somehow, when the file is read, even if a real float value for 'NaN' is stored, pynwb replaces the NaN with a 1.0. This seems confusing and generally bad.

bendichter commented 6 years ago

@JesseLivezey resolution and conversion are meant for data that are measurements e.g. what is the resolution of a voltage recording and what's the conversion to get to volts. For IntervalSeries neither of these are really necessary or meaningful, since the data field is not a measurement, so any reasonable default float would work fine. Storing as a non-float breaks matnwb though. I'm not a big fan of values changing between the API and the file either, but at least in this case it's not really that big of a deal since these values aren't meaningful.