MouseLand / suite2p

cell detection in calcium imaging recordings
http://www.suite2p.org
GNU General Public License v3.0
354 stars 241 forks source link

Cell segmentation not capturing cells - one-photon epifluorescence #941

Closed RandallJEllis closed 1 year ago

RandallJEllis commented 1 year ago

I am imaging gcamp6f in cultured tumor cells (one-photon epifluorescence), and my raw images are 1536x2048 16-bit TIF files. I imported the images as a virtual stack into Fiji, auto-adjusted brightness/contrast, and saved this as a TIF video (screenshot below) but Suite2p's segmentation (further below) is not correctly segmenting the cells. I have attained similar segmentations with feeding the raw TIF images directly to Suite2p.

I have successfully segmented the cells using CellProfiler, but CellProfiler cannot do time series analysis or calculate df/f, so I am trying to use Suite2p. I also have images of the nuclei of these cells segmented, and if this could be used by Suite2p for segmentation, please let me know. Any help is appreciated.

Raw video:

Screen Shot 2023-04-05 at 11 25 04 AM

Suite2p segmentation:

download (2)

generalciao commented 1 year ago

Consider trying anatomical segmentation - this uses cellpose, instead of the activity-based method. However, it looks like you have many overlapping cells (stacked over each other in z), which may be problematic for extracting signals from individual cells (avoiding mixed signals) when using anatomicaly ROIs.

RandallJEllis commented 1 year ago

Thank you. Can Cellpose extract df/f?

generalciao commented 1 year ago

Rereading my comment, I should have been more clear: "anatomical segmentation" is an option within suite2p, as an alternative to the standard (original) activity-based segmentation. The option can be selected in the suite2p GUI. It would then use cellpose under the hood for segmentation (defining ROIs). The subsequent signal processing (extracting signals, spike rate inference) proceeds normally. You should be able to find more information in the official documentation.

I believe if you specifically want dF/F, even after running suite2p, you would need to do some additional processing (in particular, how to properly define the baseline F depends on your data/experiment).

RandallJEllis commented 1 year ago

Thank you for clarifying that this is an option within Suite2p. I'm not seeing anything in the documentation about an "anatomical segmentation" option though. Can you please point me to any links?

generalciao commented 1 year ago

https://suite2p.readthedocs.io/en/latest/settings.html#cellpose-detection

RandallJEllis commented 1 year ago

Thank you, will test this out.

RandallJEllis commented 1 year ago

Thanks for the tip about anatomical segmentation, I set this to 2 for Will find masks on mean image and I'm obtaining what's in the below screenshot. However, the larger round non-cell ROIs are actually the cells (compare to the screenshot in the OP). I've been trying to figure out how to tell Suite2p via the GUI that these are actually cell ROIs. Do you have any advice on this? Thank you in advance! suite2p

generalciao commented 1 year ago
  1. Using the documentation link in my previous comment, refer to the diameter parameter. Set it to an appropriate size for your cells (in pixels), re-run, and check whether this improves the initial ROIs and cell/not cell classification.

  2. The fact that you are seeing your cells as ROIs is not a bad thing, even if suite2p labels them as non-cells initially. It means you have something to work with. Read about classifiers at the link below. Your goal would be to move all the “correct” ROIs so the left side (cells) and move all the “wrong” ROIs to the right side. If I were you, I’d start by selecting everything on the left (the small ones) using a drag selection, and move those to the right as non-cells (press the up key). At this point the left side would be empty. Next I would click as many of the big ROIs on the right side, moving them to the left side. Ideally, continue until all the cells you see are on the left side (i.e. properly classified). At this point, iscell.npy would contain your classifications for this movie.

I believe next you would choose “Build” from the classifier menu and select the iscell.npy file from above. I’m not sure how well it will work if you build it from only one movie, but it’s perhaps a start. After you process a second movie, load this newly built classifier and hopefully it does much better than the default one. Manually improve the labels and then add this data to your custom classifier. Rinse and repeat and it should get better with every movie.

https://suite2p.readthedocs.io/en/latest/gui.html#classifying-cells

Let us know how this goes. Good luck

PS: Whenever you spend a lot of time manually labeling cells / non-cells, I recommend periodically saving a copy of iscell.npy, to guard against losing your work. Suite2p updates that file as you go, so if you make a mistake, it can be nice to have some of your incremental work saved.

RandallJEllis commented 1 year ago

Thank you the help, I now have a classifier that classifies cell/non-cell ROIs very nicely. Now, the only thing left is to extract df/f from the cell ROIs. To start, I could try setting the baseline to either the min or mean of my videos. Also maybe worth noting that since these are cultured tumor cells, there isn't any neuropil present. Any help with extracting df/f would be greatly appreciated.

generalciao commented 1 year ago

Great to hear this helped you to get the pipeline working on your cells.

Sounds like you don't find the estimate spike rate output useful/appropriate and want a "proper" dF/F trace instead. There is no stimulus in your experiment? As I'm sure you're aware, for such cases it's common to estimate a moving baseline, for which you will find many different approaches described in recent publications. What you seem to suggest (a single constant baseline value, either min or mean) does not work well in most situations (yours may be an exception? perhaps if your recording are quite short, the baseline is stable). Some papers calculate the baseline as the Nth percentile of samples within a moving window within T seconds of each frame - you'd have to optimize/verify N and T for your data. Perhaps simpler is the median with a suitably large moving window. More complex: iteratively find events, estimate baseline without events, begin new iteration using the new baseline, etc.

I believe under the hood, suite2p already calculates a baseline-like trace as part of its spike inference algorithm, but unfortunately I don't think this is included in the standard output. At the following documentation page (including example code), it's described as "take the running max of the running min after smoothing with gaussian" and takes two parameters. If anyone else here knows how to extract that data after a standard pipeline run, please jump in (I'd find it useful myself).

https://suite2p.readthedocs.io/en/latest/deconvolution.html

No matter how you estimate your baseline, cell types, experimental conditions, etc. vary substantially and so it's important to carefully check the estimated baseline by eye against your data, and iteratively adjust parameters of the algorithm to obtain an appropriate estimate. Then test this on different data and verify whether the parameters generalize, before you (blindly) estimate the baseline across all your data. If your cells show only rare events (are mostly silent), a simpler baseline calculation may suffice, on the other hand, if your cells are highly active and/or have long-lasting events, it's a different story.

Consider whether estimate spike rate might be sufficient for your questions.

RandallJEllis commented 1 year ago

Thanks very much for this information, I will consider whether to use df/f or a different representation of each cell's time series.

A separate question--until now I was using only the first 100-200 frames of one experiment to test out Suite2p. Now I am trying to analyze a full video (6000 frames), and I'm running into two issues. The full video is 17.9GB (frame dimensions are 1536x2048), and from the Inputs page I know Suite2p requires files to be below 4GB, so I chunked the video into separate 1000-frame files (each 2.9GB), and when I supply these files as a tiff_list, my computer crashes after registration but before the extraction finishes (assuming because of RAM issues).

To remedy this, I downsampled the video 4x (1536x2048 >> 384x512) and the analysis runs, but I'm seeing a lot less cell ROIs. When I was analyzing 100 frames of the original 1536x2048 video I was getting ~900-1000 cell ROIs, and with all 6000 frames of the downsampled 384x512 video I'm getting 350-400 ROIs. I made sure to divide the diameter parameter by 4 when analyzing the downsampled video.

I'm not sure if I should pursue 1) Analyzing the downsampled video, or 2) Somehow reduce the size of the full-resolution video. Any help with this would be greatly appreciated.

generalciao commented 1 year ago

Probably best to start a new thread for separate questions.

On the issue of file size limits, I routinely run suite2p on >100 GB single file movies, but they aren't TIFF. You don't mention what acquisition software you're using. There are multiple ways to write imaging data to TIFF files. The documentation you link does not say that "Suite2p requires files to be below 4GB". It says that TIFFs saved from ImageJ (in the standard way) have a 4 GB limit. My first instinct is to suggest you stick with your native acquisition files, if suite2p supports that. As for the crash, maybe the suite2p command output has useful info? This is the kind of thing I'd recommend posting as a new thread, with more details on your file types, acq software, computer/os specs, etc.

I have no experience downsampling as you describe. Changing the diameter parameter seems reasonable. I'm not sure what's going on, but I have a hunch. Do you get fewer ROIs if you compare downsampled to original, with both being the same data (same movie, same # of frames)? I.e. downsample the 100 frame movie and adjust params to get same ROIs. Except, my advice (see below) is to use longer movies for your testing, generally.

I would recommend using many more frames for testing/optimization, if not significantly more. 100-200 frames would be just 3-13 seconds at usual frame rates, and just half a minute at quite slow imaging rates - that seems too short to contain a representative sampling of calcium activity. Perhaps it's fine for your data, but for a lot of typical situations it would not be suitable.

Note the "frames_include" parameter, which might save you having to create smaller files during testing.

If suite2p is finding many ROIs that your classifier will end up marking as "not cells", then one way to perhaps reduce the computational runtime / resources is to limit the signal extraction step to ROIs that have a high probability of being true cells. Look into the parameter "preclassify" for this. Extracting 200 signals is less work than extracting 1000 and never using 800 for analysis. I don't use this, myself.

Since registration runs okay for you, probably the parameter "batch_size" won't help, but have a look - it's related to RAM use. Similarly for ROI detection, the parameter "nbinned" might allow you to run with fewer resources, but (a) I'm not sure it's relevant when using cellpose (anatomical), and (b) I have no experience with it.

Overall, my advice would be to figure out the largest movie that your computer can handle without problems, and explore/optimize parameters on that size dataset. If your data is bigger than your computer can handle, it's usually a reasonable investment (considering the cost of your time, for one) to add more RAM or switch to a more suitable computing device. Many universities have an HPC cluster, often available at no or very reasonable cost. Once you figure out the ideal parameters/settings on your computer using a manageable movie size, you can run suite2p on the cluster with those same settings. If you're new to that sort of thing, I suspect there are resources at your institution to help you get started. Suite2p runs perfectly fine on a Linux cluster, in my experience, either with the GUI on a graphical head node, or as a script that can be run from a command-line, or submitted as a "job" for later processing. Bottom line, I recommend against spending a lot of time trying to modify your data/settings to compensate for a lack of compute power, when there are better approaches.

RandallJEllis commented 1 year ago

Thank you for all of this help. Closing this issue because I was able to get everything working on a cluster.