IntervalSeries allows storage of multiple intervals but has no description of keys

rly commented 3 years ago

From https://nwb-schema.readthedocs.io/en/latest/format.html?highlight=intervalseries#intervalseries

Stores intervals of data. The timestamps field stores the beginning and end of intervals. The
data field stores whether the interval just started (>0 value) or ended (<0 value). Different interval
types can be represented in the same series by using multiple key values (eg, 1 for feature A, 2
for feature B, 3 for feature C, etc). The field data stores an 8-bit integer. This is largely an alias
of a standard TimeSeries but that is identifiable as representing time intervals in a machine-readable
way.

So different data "keys" represent different interval types. But there is no place in IntervalSeries that stores what each of those "keys" mean.

Proposed solutions:

1) Add an attribute labels on data with string labels or descriptions of each interval type. Absolute values of values in data map to an index of labels. For example, if the data value is 100 for start interval (and -100 for stop interval), then index 100 of the labels attribute contains the string label. This is analogous to how the ndx_events.LabeledEvents type is defined: https://github.com/rly/ndx-events/blob/master/spec/ndx-events.extensions.yaml#L28-L60

Question: Do we make labels required?
- If required, then previously written IntervalSeries would not be valid against the new schema.
- If optional, then it really should be required if the IntervalSeries contains multiple interval types. This case cannot easily be validated and would depend on the user reading the documentation for IntervalSeries and following these guidelines.

2) Change the documentation to say that only one interval type should be represented in an IntervalSeries and to use multiple IntervalSeries to store different interval types.

cc @bendichter

bendichter commented 3 years ago

Thanks for creating the issue, @rly. I like the idea of adding a labels attribute. I think we should make it optional in the schema and throw a warning in PyNWB if ints >1 are used. Also note that we would need to make this 1-indexed instead of 0-indexed because we can't use 0 in the data.

rly commented 3 years ago

Also note that we would need to make this 1-indexed instead of 0-indexed because we can't use 0 in the data.

Given that ndx_events.LabeledEvents is 0-indexed and VectorIndex is 0-indexed, I think it might be more confusing for this to be 1-indexed than if it was 0-indexed and the 0th entry is intentionally left empty. It also depends on the API -- in PyNWB, we can hide this from the user, but in MatNWB, you would have to manage it yourself. What do you think? Either way, it should be clearly documented.

oruebel commented 3 years ago

You could require that entry 0 in labels is always set to no event

rly commented 3 years ago

You could require that entry 0 in labels is always set to no event

Is there a way to require that in the schema? We can write that in the documentation, and we can add a special case in the validator, but I don't think we can formally encode that requirement with the current language.

oruebel commented 3 years ago

I don't think we can formally encode that requirement with the current language.

Correct.

NeurodataWithoutBorders / nwb-schema

IntervalSeries allows storage of multiple intervals but has no description of keys #480