[Documentation]: FAQ or Best Practice for representing co-registered ROI IDs

CodyCBakerPhD commented 1 year ago

What would you like changed or added to the documentation and why?

Opening here for now, but feel free to transfer wherever is best

For co-registration of cell IDs identified from the same subject but multiple ophys sessions, recommendation is to use a custom column to indicate the global ID across sessions and keep the typical ROI table IDs as being the ones specific to each individual session

Do you have any interest in helping write or edit the documentation?

Yes.

Code of Conduct

[X] I agree to follow this project's Code of Conduct
[X] Have you checked the Contributing document?
[X] Have you ensured this change was not already requested?

oruebel commented 1 year ago

@CodyCBakerPhD it seems like in PyNWB maybe adding a note in the ophys tutorial may make sense https://pynwb.readthedocs.io/en/stable/tutorials/domain/ophys.html#sphx-glr-tutorials-domain-ophys-py .

Adding it to the best practices in the nwbinspector docs seems useful (even if there is no check function for this in the inspector), since this is currently the main place we point to for best practices https://nwbinspector.readthedocs.io/en/dev/best_practices/best_practices_index.html

recommendation is to use a custom column to indicate the global ID across sessions and keep the typical ROI table IDs as being the ones specific to each individual session

Could you clarify why this is the recommended strategy? I think the strategy you describe is fine; I'm just trying to understand the reasoning for it better. Is it mainly because the global IDs are determined as a separate processing step after the ROI extraction on the individual files and so we should not overwrite the original IDs but store the global IDs separately; or is there another reason?

CodyCBakerPhD commented 1 year ago

Is it mainly because the global IDs are determined as a separate processing step after the ROI extraction on the individual files and so we should not overwrite the original IDs but store the global IDs separately; or is there another reason?

That's exactly the reason, you'd acquire the data and segment at least one sessions first (and hopefully even save that to NWB at that stage) then there are two cases

(a) if that session had multiple planes, perhaps you want to segment each plane separately and if each plane is contiguous enough in space you way identify 'global' unit IDs across all planes

or

(b) acquire and segment more sessions from the same animal and region, then co-register the same 'global' ID over time

See https://github.com/RichieHakim/ROICaT#readme as a tool that is being built (or works already?)

It might even be capable of reading NWB files via the ROIExtactors integration, in which case we might want to add it to the overview. @RichieHakim would know

RichieHakim commented 1 year ago

See https://github.com/RichieHakim/ROICaT#readme as a tool that is being built (or works already?)

It works great.

It might even be capable of reading NWB files via the ROIExtactors integration, in which case we might want to add it to the overview. @RichieHakim would know

Yes, it is capable of reading NWB files as input.

Could you clarify why this is the recommended strategy?

This information is useful because it allows for analyzing the same neurons over sessions or planes, which is becoming increasingly common.

Could you clarify why this is the recommended strategy?

The approach described by @CodyCBakerPhD makes sense to me and is what I use. The output of my ROI tracking pipeline is generally a list of integer arrays where each array is a session, and each element of the array is an ROI (with the same index position as other representations of the ROI like fluorescence traces and masks), and the value of each element is the 'unique ROI ID number'. In this way, ROIs with the same ID number from different imaging sessions / planes can be defined as deriving from the same source neuron / ROI. Generally ROIs that are 'unassigned' to a unique source / cluster are given an ID number of -1. For this reason, the datatype is generally an int64.

This particular data representation strikes an appropriate balance between sparsity/efficiency and clarity/ease-of-use compared to other representations. There are many other representations that provide more clarity or more sparsity (look-up tables, sparse arrays, see here for a related function).

NeurodataWithoutBorders / pynwb