NeurodataWithoutBorders / nwb-schema

Data format specification schema for the NWB neurophysiology data format
http://nwb-schema.readthedocs.io
Other
53 stars 16 forks source link

Request for support of spike raster type #554

Open oruebel opened 1 year ago

oruebel commented 1 year ago

Use Case: A common mechanism for neural data processing is the spike raster matrix. This is a sparse, binary (0 or 1) 2D matrix, generally in the shape of:( electrode/neuron #, time (ms) ). This base matrix is then indexed into, generally based on behavioral epochs, and binned/smoothed/etc, projected down to low dimensions, input into machine learning algorithms, etc.

Current: NWB uses the Units table as the primary representation for sorted spikes. NWB currently does use the spike raster matrix format directly as it is a derived, lossy representation of the data.

**Proposal: For analysis purposes it would be useful to be able to: a) easily create a spike-raster matrix from the Units table for analysis and b) be able to store the spike raster matrix in the file as part of the "/analysis" folder to simplify and accelerate downstream analysis. I.e., the Units table would remain the main data representation and the spike raster matrix would be an additional representation of the data that would be derived from the Units table.

Proposed process: As this is a major addition, this should follow the NWB proposal process described here https://www.nwb.org/proposal-review-process/ . As such, the first step should be to create an ndx extension. To store the spike raster, the extension would look similar to ElectricalSeries:

This issue is based on a request by @bil-paul

oruebel commented 1 year ago

CC @rly @bendichter

bil-paul commented 1 year ago

Thank you for posting this request.

This type of data representation is fundamental for making NWB most useful for systems and computational neuroscientists, particularly for accessing large NWB files living in cloud spaces. I look forward to it being incorporated.

I would like to propose that the 2D array be stored in the maximally useful format for the type of analysis that the spike raster would be used with: a chunked, shuffled, compressed, scaleoffset-enabled dataset with chunk shapes that are some time width (e.g., a default of 20ms) x all units.

Thank you for the consideration.

oruebel commented 1 year ago

a chunked, shuffled, compressed, scaleoffset-enabled dataset

Thanks for the feedback. The storage options are something we'll need to consider when implementing this in the APIs (PyNWB / MatNWB).