Closed bendichter closed 6 years ago
There's also a discrepancy between the schema and pynwb.
The pynwb API accepts float
or str
for conversion
:
https://pynwb.readthedocs.io/en/latest/pynwb.base.html#pynwb.base.TimeSeries
However the NWB schema only accepts float
:
https://github.com/NeurodataWithoutBorders/pynwb/blob/631f6db9cf7fa1eda951094108c435bae7b2027b/src/pynwb/data/nwb.base.yaml#L113-L114
@bendichter @ajtritt I tried to trace the IntervalSeries
code to find where the NaN
might get converted to a string, but couldn't figure out where attributes like conversion
get written. I'm somewhat motivated to fix this since Ben suggested I use this for the dataset I'm converting to NWB. If either of you can give me some suggested places to look, I can try and find the problem and make a PR with a fix/test.
I did notice here that conversion
might have NaN
as a default value in this IntervalSeries
yaml entry.
https://github.com/NeurodataWithoutBorders/pynwb/blob/f01f5e672c56c62e1d77212163962716f17be362/src/pynwb/data/nwb.misc.yaml#L104
but I'm not familiar enough with the library to know whether this is relevant.
@JesseLivezey Yes I think that's helpful! Probably that NaN
is being interpreted as a string.
Although it looks like this value is being used correctly up until some point https://github.com/NeurodataWithoutBorders/pynwb/blob/c3a1bed686ff291bb0e523573b9b04ea073dcfc7/src/pynwb/base.py#L11 any idea where these type of attributes get written to the h5 file?
Something happens here https://github.com/NeurodataWithoutBorders/pynwb/blob/c3a1bed686ff291bb0e523573b9b04ea073dcfc7/src/pynwb/form/backends/io.py#L39
Both conversion
and resolution
become 'NaN' when they are both 1.0 in the preceding line.
@bendichter @ajtritt I think there are two problems
1) One problem is that there is no dtype check/conversion for non-string/text dtypes in this function https://github.com/NeurodataWithoutBorders/pynwb/blob/c3a1bed686ff291bb0e523573b9b04ea073dcfc7/src/pynwb/form/build/map.py#L552 This is why 'NaN' is getting saved as a string and not a float.
2) A second problem is how default values are supposed to work. For instance, you can't pass conversion
to the IntervalSeries
constructor, but it picks up this default value (1.0) for a while
https://github.com/NeurodataWithoutBorders/pynwb/blob/c3a1bed686ff291bb0e523573b9b04ea073dcfc7/src/pynwb/base.py#L11
Then, when the file is written, the value is determined by a combination of values from the yaml spec and values in the container
object with the following logic:
value = spec.value
if value is None:
value = container.value
if value is None:
value = spec.default_value
which in this case means a float value of 'NaN' is saved (checking with direct hdf5 access).
Then, somehow, when the file is read, even if a real float value for 'NaN' is stored, pynwb replaces the NaN with a 1.0. This seems confusing and generally bad.
@JesseLivezey resolution
and conversion
are meant for data that are measurements e.g. what is the resolution of a voltage recording and what's the conversion to get to volts. For IntervalSeries neither of these are really necessary or meaningful, since the data field is not a measurement, so any reasonable default float would work fine. Storing as a non-float breaks matnwb though. I'm not a big fan of values changing between the API and the file either, but at least in this case it's not really that big of a deal since these values aren't meaningful.
IntervalSeries.conversion returns
conversion = 1.0
before and after read:However the data is actually stored as
'NaN'
(the string'NaN'
, not a NaN float value):This is causing issues for matnwb, which expects this value to follow the schema, which defines it to be a float. Is there a reason why it's stored this way?