Open bendichter opened 4 years ago
@rly when you return I'd be interested in your thoughts
Jeep, that could be useful. For what would you use the labels and keys?
The labels would give you a human readable description of what the events were. The keys would map those labels to values
@bendichter But you could use epochs
for that as well or?
Sounds like you’re describing a CV, which is usually represented with a table. I’d recommend just using a DynamicTable for the mapping. And the data could be stored as a DynamicTableRegion. I’m not sure how that would play with the TimeSeries.data tho
what do you mean by CV?
controlled vocabulary.
Oh I see. Yeah very similar but I think the one difference is that the actual value of the int is important to the labs as well as the label.
That might not be a CV then. Are the values of the pulses enumerable?
Enumerable? The ints are chosen by the researcher to signal specific events. There is a finite number of them and they generally are not sequential.
Although TTL pulses are a sequence of events in time, I do not think inheriting this type from TimeSeries is most appropriate here. TTL pulses are not generic time series measurements (the data don't have units) and the timestamps are almost never uniformly sampled in time. It would be more appropriate to use or alias the LabeledEvents type https://github.com/NeurodataWithoutBorders/nwb-schema/pull/302/files which is similar to what you have here ("keys" = "labels", "labels" = "label_map")
In my experience, users sometimes work with all of the TTL pulses together in order of time, but more often they separate the pulses by value and work with only a subset of them, often in isolation (peri-stimulus time histogram), and sometimes together in order. Those use cases would be best served by two different storage options:
For Option 1, if the user wants to get the data in the form of Option 2 (i.e. split the array by value into K 1-D arrays), a naive implementation takes O(NK) time - I'm not sure this can be done faster.
For Option 2, if the user wants to get the data in the form of Option 1 (i.e. create a single array of all of the pulse values in order of time), then they would have to do a K-way merge sort of the K 1-D arrays, which is O(N log K) time.
Given that difference in efficiency (at the limit) and that I think working with only one or a subset of pulse values is more common than working with all pulse values in order, I think Option 2 would be better.
- store K 1-D arrays of timestamps, where K is the number of unique pulse values.
Could we store this in the same way we store spike units?
Yes, a DynamicTable would work, but for some reason, it feels less intuitive to me (same for the units table). It's a little overkill, but it would organize the TTL pulses together. Granted, the user does not need to know (and to some degree should not need to know) the low-level details of how the data is structured.
Ultimately, I think that TTL pulses, spike unit times, and events in general, could, and probably should, be stored in the same way as each other. So, yes, option 3:
DynamicTable
with indexed column pulse_times
, and normal column label
.I worry about the proliferating use of DynamicTables
, because it sort of side-steps the whole point of NWB:N, which is to provide good standards for meta-data. DynamicTable
is basically an open door that allows you to store any type of data you want with minimal constraints. Any open door will lead to the temptation to use it for every datatype that is not currently supported, and I worry about the long term effects of that. This is useful for purely-meta-data structures like trials
, where this flexibility is important and adherence to a strict standard is less important, but I worry when they are used for annotating acquired data. When they are used to annotate data, DynamicTable
s on their own are invariably insufficient for the chosen type of data and we end up having to compensate with hacks. Even with Units
there is an issue with storing the sampling rate of the waveforms and with linking individual spikes to snippets. The only way to resolve this flexibly would be to allow DTs to have global attributes, and at that point they would not enforce any structure at all!
I agree that we need a way to express time series objects that are not measurements and are unitless. I think the proposed Events
and/or LabelledEvents
addresses this problem nicely, and I like that it generalizes to cases outside of TTL pulses because this has come up a few times and will continue to be needed. The one issue I have with the proposed extension is the names of the datasets. To me, label
should be a string, i.e. the value of the dictionary, not the key.
Then there is the question of whether we should store a one object that captures all TTL pulses or several objects that each store a TTL pulse event type. This comes down to whether we want to make the data closer to what is acquired or closer to the semantic meaning/how it will be used an analysis. In this case I vote to make it closer to acquired with LabelledEvents
because I imagine TTL pulses going in /acquisition
. If users want to pull out specific events, we could encourage them to do that with Events
objects in a processing module.
@bendichter Coming back to this, when you wrote:
When they are used to annotate data, DynamicTables on their own are invariably insufficient for the chosen type of data and we end up having to compensate with hacks. Even with Units there is an issue with storing the sampling rate of the waveforms and with linking individual spikes to snippets. The only way to resolve this flexibly would be to allow DTs to have global attributes, and at that point they would not enforce any structure at all!
What do you mean by global attributes? I think it is fine for subtypes of DynamicTable
s to define attributes that apply to all rows of the table.
What is the issue with linking individual spikes to snippets? I saw a bunch of issues/PRs on UnitSeries and it looked like a reasonable solution was agreed upon https://github.com/NeurodataWithoutBorders/nwb-schema/issues/248
It would be nice to have an intuitive and machine-readable neurodata_type for TTL pulses. Something like