NeurodataWithoutBorders / pynwb

A Python API for working with Neurodata stored in the NWB Format
176 stars 85 forks source link is stored as 'NaN' #592

Closed bendichter closed 6 years ago

bendichter commented 6 years ago

IntervalSeries.conversion returns conversion = 1.0 before and after read:

from pynwb import NWBFile, NWBHDF5IO
from pynwb.misc import IntervalSeries

interval_series = IntervalSeries(data=[-1, 1], timestamps=[1, 2], name='test_interval_series', source='source')


nwbfile = NWBFile("source", "a file with header data", "NB123A", '2018-06-01T00:00:00')

with NWBHDF5IO('test_interval_series.nwb', 'w') as io:

with NWBHDF5IO('test_interval_series.nwb', 'r') as io:
    nwbfile_in =

However the data is actually stored as 'NaN' (the string 'NaN', not a NaN float value):

import h5py
with h5py.File('test_interval_series.nwb', 'r') as file:

This is causing issues for matnwb, which expects this value to follow the schema, which defines it to be a float. Is there a reason why it's stored this way?

bendichter commented 6 years ago

There's also a discrepancy between the schema and pynwb.

The pynwb API accepts float or str for conversion:

However the NWB schema only accepts float:

JesseLivezey commented 6 years ago

@bendichter @ajtritt I tried to trace the IntervalSeries code to find where the NaN might get converted to a string, but couldn't figure out where attributes like conversion get written. I'm somewhat motivated to fix this since Ben suggested I use this for the dataset I'm converting to NWB. If either of you can give me some suggested places to look, I can try and find the problem and make a PR with a fix/test.

I did notice here that conversion might have NaN as a default value in this IntervalSeries yaml entry. but I'm not familiar enough with the library to know whether this is relevant.

bendichter commented 6 years ago

@JesseLivezey Yes I think that's helpful! Probably that NaN is being interpreted as a string.

JesseLivezey commented 6 years ago

Although it looks like this value is being used correctly up until some point any idea where these type of attributes get written to the h5 file?

JesseLivezey commented 6 years ago

It's 'NaN' already by this line.

JesseLivezey commented 6 years ago

Something happens here

Both conversion and resolution become 'NaN' when they are both 1.0 in the preceding line.

bendichter commented 6 years ago

tracked to here:

JesseLivezey commented 6 years ago

@bendichter @ajtritt I think there are two problems

1) One problem is that there is no dtype check/conversion for non-string/text dtypes in this function This is why 'NaN' is getting saved as a string and not a float.

2) A second problem is how default values are supposed to work. For instance, you can't pass conversion to the IntervalSeries constructor, but it picks up this default value (1.0) for a while

Then, when the file is written, the value is determined by a combination of values from the yaml spec and values in the container object with the following logic:

value = spec.value
if value is None:
    value = container.value
    if value is None:
        value = spec.default_value

which in this case means a float value of 'NaN' is saved (checking with direct hdf5 access).

Then, somehow, when the file is read, even if a real float value for 'NaN' is stored, pynwb replaces the NaN with a 1.0. This seems confusing and generally bad.

bendichter commented 6 years ago

@JesseLivezey resolution and conversion are meant for data that are measurements e.g. what is the resolution of a voltage recording and what's the conversion to get to volts. For IntervalSeries neither of these are really necessary or meaningful, since the data field is not a measurement, so any reasonable default float would work fine. Storing as a non-float breaks matnwb though. I'm not a big fan of values changing between the API and the file either, but at least in this case it's not really that big of a deal since these values aren't meaningful.