NeurodataWithoutBorders / nwb-schema

Data format specification schema for the NWB neurophysiology data format
http://nwb-schema.readthedocs.io
Other
53 stars 16 forks source link

cell ID (for intracellular electrophysiology) #416

Closed lvsltz closed 2 years ago

lvsltz commented 4 years ago

Somewhat related to #413.

The nwbfile has the identifier, session_id and experiment_description parameters that help identify a given recording. For intracellular recordings where there are multiple cells in a single session (and saved in separate .nwb files), shouldn't there be an additional identifier, like cell_id or unit_id ?

I guess this id could also go in IntracellularElectrode, but it doesn't look like the right place.

bendichter commented 4 years ago

Good question. This is something that has come up with the BICCN work as well. We need unique identifiers on cells that can map to data outside this file, in other NWB files or in other databases, e.g. morphological information for the same cell stored somewhere else. We also need slice ids and tissue sample ids so that we can determine what cells were in the same slice/tissue sample and again to map to data outside the file.

lvsltz commented 4 years ago

We need unique identifiers on cells... e.g. morphological information... We also need slice ids and tissue sample ids

So do we, here

EnricoScanta commented 4 years ago

And for ion channels (channelpedia), beside cell_id and experiment/session id, we have frozen cell vials (instead of slices and tissues), each one with its cell_vial_id.

EnricoScanta commented 4 years ago

As of nwb.org/best-practices, 'session_id' field should mark a unique experimental recording sessions, and if the data of a recording session are separated out into multiple files, still the session_id should be the same for each file. Then a field to identify each file is needed along with 'session_id', and as for 'session_id' I would say it should be human readable and should have the same value in case the file is re-generated. Therefore the field 'identifier' cannot be used for this purpose, since it is recommended to be a globally unique id generated by uuid, and moreover it is placed in a different path of the nwb tree with respect to 'session_id'. In patch clamping we record from many cells in the same session, and we save each cell data in a different nwb file, so we suggested 'cell_id' in this thread (electrode is not enough to distinguish since in automated patch clamping one electrode records sequentially from many cells in the same session). To be more general, including cases where session data are separated out based on something else, a field called 'file_id' or similar would work as well, possibly with an attribute to describe its content. All other ids mentioned in this thread, I believe they can be stored as extra columns in conditions table.

yarikoptic commented 3 years ago

I would like to ping on this now aging issue of cell id (and slice id, tissue ids). Although we can store them in an NWB extension (and @lvsltz could do in some other), it feels that it better be supported by the stock NWB. After all those are just IDs, so should be quite easy to "add".

tmchartrand commented 2 years ago

this issue recently came up in discussion regarding BICCN patch-seq data - has there been any progress? Seems like a pretty significant gap in the NWB:N standard.

bendichter commented 2 years ago

@tmchartrand (and anyone else who would like to weigh in) would if fix your issue if we made an optional "cell_id" attribute field on the IntracellularElectrode object? This would be relatively simple for us, but would not allow you to add any metadata other than the cell_id, and would not allow you to associate multiple cells with a given IntracellularElectrode object. Another approach would be to make a new neurodata_type called Cell and link an IntracellularElectrode to it. This would allow you to extend Cell to add your own metadata. Yet another approach would be to create a CellTable, which would allow you to add arbitrary (but unstandardized) metadata in the form of custom columns to that table.

Some questions that would help us determine the best approach here.

  1. Is it OK to have a 1-1 relationship between IntracellularElectrode and cell_id, or do you need something more flexible?
  2. Do you need additional metadata associated with a cell, or just the cell_id? If so, what fields are they?
  3. If you need additional metadata, how standardized would you expect this to be within and across research groups?
tmchartrand commented 2 years ago

I think probably the simple solution is the best, just adding the cell_id field (that would still allow multiple electrodes per cell, just not the reverse, which is good). Some of the other metadata I could imagine adding are really the results of analysis (transcriptomic or morphological identity), and should probably be kept outside of the NWB and linked by ID, which this would allow. The one exception might be anatomical location, and maybe higher levels of organization, like a slice ID (both are possible to specify in the DANDI asset schema fwiw). I think there I don't have strong feelings whether it's in the core schema or an extension.

bendichter commented 2 years ago

Indeed, IntracellularElectrode already has a "slice" field and a "location" field, which could be used for additional metadata, and you can always extend IntracellularElectrode to add more specific information.

oruebel commented 2 years ago

This sounds reasonable to me; adding the cellid to IntracellularElectrode makes sense if we don't plan to include more metadata about the cell.

If (in the future) we plan to include more metadata about cells, than I think at that point we should revisit the idea of creating a dedicated Cell or CellTable neurodata_type as creating those will require more engagement with the community to determine the different types of metadata we need to capture here.