MouseLand / suite2p

cell detection in calcium imaging recordings
http://www.suite2p.org
GNU General Public License v3.0
348 stars 240 forks source link

Critical parameter/s for Cell detection (ops0.sig in MATLAB) #129

Closed PhantomSpike closed 4 years ago

PhantomSpike commented 5 years ago

Hi,

I have recently transitioned to the Python version of Suite2P from Matlab. Most things are intuitive and obvious but I have two questions:

  1. Parameter ops0.sig in Python?

In the MATLAB version of the code, under the "cell detection options" there is a parameter called: ops0.sig = 0.5; % spatial smoothing length in pixels; encourages localized clusters

This parameter is very important in my experience, and changes quite a lot the number of cells that you detect and their shape. I cannot find the same parameter (or its analog) in the Python version. None of the parameters listed below from the Python Suite2P seems to do the same:

    # cell detection settings
    'connected': True, # whether or not to keep ROIs fully connected (set to 0 for dendrites)
    'navg_frames_svd': 5000, # max number of binned frames for the SVD
    'nsvd_for_roi': 1000, # max number of SVD components to keep for ROI detection
    'max_iterations': 20, # maximum number of iterations to do cell detection
    'ratio_neuropil': 6., # ratio between neuropil basis size and cell radius
    'ratio_neuropil_to_cell': 3, # minimum ratio between neuropil radius and cell radius
    'tile_factor': 1., # use finer (>1) or coarser (<1) tiles for neuropil estimation during cell detection
    'threshold_scaling': 1., # adjust the automatically determined threshold by this scalar multiplier
    'max_overlap': 0.75, # cells with more overlap than this get removed during triage, before refinement
    'inner_neuropil_radius': 2, # number of pixels to keep between ROI and neuropil donut
    'outer_neuropil_radius': np.inf, # maximum neuropil radius
    'min_neuropil_pixels': 350, # minimum number of pixels in the neuropil

Am I missing something or it is not possible to set this parameter or a similar analog?

  1. Critical parameters

This is a slightly more general question.

What are the key parameters in the # cell detection settings section that are likely to change the number and/or quality of the cells that I find?

Let's say that I am not happy with the ROIs that I detect and it is not a simple misclassification by the classifier such that I can move the cells from 'not cells' -> 'cells'. It is a problem of the ROI detection itself. What parameters should I start tweaking and in what way in order to improve my detection?

@carsen-stringer

Thanks a lot for your help!

Alex

marius10p commented 5 years ago

Hi Alex,

If the ROIs are less smooth or complete than you're used to, it's probably due to smoothing (spatial and temporal) as you suspect.

That smoothing constant is now set by default to diameter/10. The temporal binning is now set automatically based on the sampling rate fs and timescale tau (it's fs * tau). High-passing then removes slow drift (default 100s). We think these defaults should be on average good, but feel free to try changing them. For example, you could set the diameter to be larger than you really think the diameter of the cells is, or the timescale longer than you think it really is. Let us know if that helps, and we can consider exposing more of those parameters.

Probably less important, but the threshold scaling controls when to stop extracting ROIs, and the number of svds automatically scales with the temporal binning. I would also check out the results of the registration (use registration metrics in the drop down module), to make sure that is not different. There can be small differences, from small improvements in the Python version. Also, sometimes poor ROIs are due to non-rigid motion, so you can try turning that on or off.

Best, Marius

PhantomSpike commented 5 years ago

Hi @marius10p ,

Thank you for the reply. What you are saying makes sense.

I have a few questions about the parameters and how they work:

"Before computing the principal components of the movie, we bin the data such that we have at least as many frames to take the SVD of as specified in the option ops['navg_frames_svd']. The bin size will be the maximum of nframes/ops['navg_frames_svd'] and ops['tau'] * ops['fs'] (the number of samples per transient)."

  1. ops[navg_frames_svd]

Does this mean that navg_frames_svd controls how many frames you have per bin when you bin the data? Why not just run the SVD on the whole movie rather than splitting it into bins? Is this for computational efficiency or it adds some value to the extraction process? What would be the main effect of changing navg_frames_svd?

  1. ops['nsvd_for_roi']

To use "Kilosort slang", is this in any way similar to ops.Nfilt i.e. how many components to use for the ROI detection? So you would want more than your actual no of ROIs and you would merge or throw away many of them afterwards?

  1. ops[threshold scaling] = "adjust the automatically determined threshold by this scalar multiplier"

"On each iteration, up to 200 peaks are extracted from the correlation map. These are the largest remaining peaks such that they are greater than the threshold, which is set to be proportional to the median of the peaks in the whole correlation map: ops['threshold_scaling'] * np.median(peaks[peaks>1e-4]"

What is the interpretation of this parameter? If I understand correctly, is it roughly ~ smaller threshold scaling = more cells detected & bigger threshold scaling = less cells detected?

I am looking at some of my data (see example) and it seems that Suite2P misses quite a few very obvious ROIs. Many of the missed ROIs are bright, round and standing out from the background so not sure why they are not picked up. And as you can see from the picture it is not a simple matter of just moving them from 'not cells' -> 'cells'. I think the ROIs that are not correctly segmented can be divided roughly into several types:

  1. Missing completely

  2. Exist but the shape is too weird and it includes too many non-ROI pixels (too big)

  3. Wrong shape that just doesn't match very well and could include other junk from near by e.g. boutons, spines, axons, blood vessels etc.

  4. I also see sometimes two ROIs close to each other merged together into a 'Frankenstein' ROI

What should I do to address these issues? My general feeling from using Suite2P is that it detects a lot less cells than manual adding of ROIs would. It seems that if I somehow cranked up the sensitivity that might solve some of my problems. I don't mind having to sort more ROIs as long as the correct ones are there too.

Should I increase ops['nsvd_for_roi'] and decrease ops[threshold scaling] to get more ROIs detected? Would increasing ops.['max_iterations'] potentially help too?

What about the shape problems? Is the diameter option my best bet?

I already checked the registration and it looks sensible. I don't think that is the problem.

Thanks a lot!

Best,

Alex

roi_detection

PhantomSpike commented 5 years ago

Hi @marius10p ,

Any thoughts on this? :)

marius10p commented 5 years ago

Some of the information below is new in the Python version, but we'll be pushing an update to the Matlab version as well soon to make them equivalent.

  1. Both computational efficiency and increased SNR. In the Python version this is now chosen adaptively based on sensor timescale, but you can override it manually. Too high and you lose SNR. Too low and you loose some high-frequency features of the transient, thus impairing clustering ability.
  2. No. It is how many svds are kept during the process in 1). Has similar tradeoffs of SNR and high-frequency information.
  3. Correct.

As for your general questions, if an ROI has activity it has to be detected as you lower the threshold. It cannot not be picked up, because suite2p continuously looks for spots of correlated pixel activity, and if it has not picked it up already, it will at the next iteration. You can see a correlation map of the raw data in the GUI that shows you which ROIs have activity, and it is this map suite2p is inspecting all the time during optimization. If the cell that has not been picked up at a location does not have a blob on the correlation map, than it really does not have activity, even though it might have baseline fluorescence leading you to see it's shape on the mean image. I understand that some users want even the cells that do not have activity. Please read our review paper to see why an activity-based approach might be a better idea, specifically figure 3.

Similarly, if a cell is continuously saturated it won't have correlated pixel activity, or if it's overexpressing.

PhantomSpike commented 5 years ago

Hi Marius,

Thank you for the thoughtful response.

About the correlation map, what you are saying is true but I think there are two confounds with the correlation based approach.

  1. Percentage of responsive ROIs

Often you want to get an estimate of what % of the ROIs are responding to a given stimulus. If they almost never fire (in our case auditory cortex, which has very low average firing rates and sparse activity) you will grossly overestimate the % of responsive cells . So picking up the silent ROIs is often desirable.

  1. Sparsely active cells

In some parts of the brain (e.g. auditory/somatosensory cortex) the activity in neurons can be very sparse, even if they are highly tuned. So let's say in a 20-30 min recording they might fire only a few times, especially if the stimulus is not very driving. In our case we present pure tones at different levels and typically a neuron would respond to a small number of frequency/level combinations and not on all trials. During the rest of the time the pixels should still in theory be correlated as they are from the same cell so things like subthreshold inputs, spontaneous activity in the network, random fluctuations will drive the correlations up. But with indicators which are not very bright e.g. 6f these fluctuations might be too small and comparable to the background noise. So for a highly tuned but very sparse cell, the activity for each pixel of the cell will be a vector filled with mostly random noise and very few entries of high fluorescence when a spike occurred. When you compute the correlation between the pixels of this cell you wouldn't get a high value because the correlations will be dominated by the random noise.

Since we have both 6m and 6f data, this does seem to be more of a problem for the 6f. The 6m data compares quite well to a previous manual curation pipeline that we used. For 6 f though, I do think we miss a non-negligible proportion of active cells.

Not sure what the solution is. Clicking cells manually would require the extraction of the activity again and will be quite tedious.

marius10p commented 5 years ago

Suite2p does pick out sparsely active cells that have good SNR. However, I would not recommend using gcamp6f ROIs that have weak signals. At that point, the out of focus contamination dominates the signal by an order of magnitude,. so it's very difficult to know if you are analyzing true somatic signal or the residual neuropil. Unfortunately, most cells with gcamp6f are like that. Switching to jGCaMP7f would help a lot.

As for 1), I think that's a specialized problem you're having. If that's really what you want to estimate, there is no other way than to use an anatomical somata detection method. You'll want something that looks specifically for donuts. Again, please read the review paper I sent you for more information about such methods.

marius10p commented 5 years ago

Here is a typical example including a mix of sparsely and densely active cells. This is a two hour recording somewhere in somatosensory cortex.

image

PhantomSpike commented 5 years ago

Yes, I see.

I think as you say it is probably only an issue for cells with low SNR. I also have sparse cells which are detected fine. :)

marius10p commented 5 years ago

Fyi, I have an update on the way that significantly improves detection of very sparse cells, as long as they trigger at least one decent transient. Will update in a few more days.

PhantomSpike commented 5 years ago

Hi Marius,

Thanks a lot for looking into this!

I had a look in the commits for new updates but didn't see anything yet.

Let me know once the update is live, I would be keen to try it out.

Thanks again!

marius10p commented 5 years ago

Almost there... just have to re-train the classifier. I look forward to getting feedback from you on this, so I'll let you know as soon as it's up.

marius10p commented 5 years ago

Try the branch "sparse_mode". A few differences to be aware of:

1) The diameter is no longer used as a parameter, although it's still in the GUI. The algorithm is instead explicitly multi-scale. 2) The stopping criterion for cell extraction is determined automatically based on spatial scales and duration of the recording. To a first approximation, a cell is extracted if the sum of all its transients above a preset threshold is at least N times the preset threshold, where N is the number of time epochs. N scales linearly with the number of approx. independent frame samples, and is always at least 1. So a gcamp6s dataset with a timescale of 1s, recorded for 1 hour would be 3600 "independent" samples, and 3 epochs (1 epoch = 1200 samples). 3) The footprint now encodes the spatial scale the ROI was picked up at, and is usually a good visualization. 4) The default classifier has been adapted from the previous version rather than retrained, which may be suboptimal.

PhantomSpike commented 5 years ago

Hi @marius10p ,

Thanks for looking into this. I will try it in the next couple of days and let you know how it goes!

hummuscience commented 5 years ago

After reading the above discussion, I have similar problems as @PhantomSpike mentioned. My cells do not have spontaneous activity but have strong responses when stimulated (which happens every 3-5 minutes). So in a way, I am also dealing with "sparse" activity.

Additionally my cells have a large range of diameters.

Until now, I have been using the normal pipeline and just ignoring the cells that were either "frankensteins", not picked up, or with large detected processes.

@carsen-stringer suggested that I try the sparse_mode branch and I got to it today.

From my first test, where I left all parameters fixed and only changed the frame rate and the tau (4Hz and 0.7 Tau), it is doing a much better job than the normal pipeline.

The ROIs appear "dotty". Any idea why that is happening?

carsen-stringer commented 5 years ago

Could you post a picture of what you mean by dotty please?

hummuscience commented 5 years ago

So, I ran the pipeline on the complete experiment which has a HighK stimulus at the end (which should help detect all cells responding). The first thing I notice is that two things improved:

1) there are less "large" or "frankenstein" ROIs, even though they still exist. 2) It is seperating cells that are close to each other much better and the total number of detected correct cells is higher than the normal pipeline!

There are however also a much larger number of detected ROIs that are not cells, as is seen in the correlation map.

The enhanced image view shows the separation after I went through all the ROIs. I need to build a new classifier, probably. How many datasets/cells do you think I will need for a robust classifier?

I am getting many ROIs that have a much igher neuropili signal compared to the fluorescence. Is there a way to block them from even being picked up?

Overall I am happy with the results and I will probably switch to sparse_mode (or, rerun many of my datasets with sparse_mode). I wonder if I can further improve the ROI detection as there are still some cells that are obvious in the video but are either not detected or (much more likely) have a huge ROI which includes a lot of the area around. Ideally the ROIs should be semi-round.

I was so free and added the raw tiff files of that experiment I mentioned in the other thread to that folder you shared (removed the other .tif file). Feel free to use that and thanks for the great pipeline!

screenshot 2019-02-26 09 04 09 screenshot 2019-02-26 09 07 32

carsen-stringer commented 5 years ago

The parameter ops['threshold_scaling'] now controls the stopping criterion in sparse_mode too. It seems like the algorithm should have stopped extracting sooner. The default for threshold_scaling in the sparse_mode branch is 5.0 (that's what you ran with). I would recommend setting it higher and seeing if that helps.

carsen-stringer commented 5 years ago

The throwing out overlaps setting wasn't working in the sparse_mode branch! sorry about that. this may get rid of some of your crappy ROIs. please pull the latest version

hummuscience commented 5 years ago

Which parameters affects the merging of pixels into ROIs? Thinking about a way to separate cells that are merged into one ROI.

what do navg_frames_svd and nsvd_for_roi actually do?

The way I undertand it right now, navg_frames_svd tells suite2p how many of the total frames to use for ROI extraction. So if I have 10000 frames, and I set it to 5000, it will take the first 5000 (?).

nsvd_for_roi I don't understand.

If you have any furthe reading for me that might help me understand what is happening, please feel free!

hummuscience commented 5 years ago

Another question. Is the radius that I find in stat.npy the radius of the whole ROI, including the pixels that overlap with other ROIs?

carsen-stringer commented 5 years ago

nsvd_for_roi is not used in sparse mode

navg_frames_svd helps decide the binning for finding cells. We bin by

binsize = max(ops['tau'] * ops['fs'], ops['nframes'] / ops['navg_frames_svd'])

So it you can set it larger if you want the bins to be smaller (which you might if you're getting merges) but it will only be as small as ops['tau'] * ops['fs'] as it's defined

the radius is the whole ROI. but in sparse_mode sometimes there is a small hole in the middle of the ROI and this screws with the simple way I'm computing the radius. We are thinking of ways to smooth the ROIs in sparse_mode but we haven't had time to implement any of them yet. I'd go off 'npix' for the size of the ROI for now.

carsen-stringer commented 5 years ago

I can add a parameter to control how much an ROI extends which may help with the merging, but first I'd recommend trying to change the binning

You'll see the binning when the pipeline says [5000, Ly, Lx] after registration, that 5000 number is the number of frames used in the cell extraction

hummuscience commented 5 years ago

I am using fs = 4 as my frame rate is 4Hz (single plane) and tau set at 0.7.

I am trying to understand if my tau is set correctly. If I understood correctly, tau is the decay time of the calcium sensor, right?

I am using Cal-520 as sensor. In the first publication where Cal-520 was combined with patching and compared to OBG-1 it got the following results (Tada 2014):

10–90 rise time: 0.0637 +/- 0.0045 decay time constants: 0.691 +/- 0.113 s for Cal-520

In another review where multiple calcium dyes were compared, the following data were presented:

screenshot 2019-03-06 17 13 11 screenshot 2019-03-06 17 13 30

As you are probably more used to GCaMPs, seeing this table, do you think a tau of 0.7 is suitable?

carsen-stringer commented 5 years ago

I would try 0.7-1, I would generally recommend rounding up in these situations

hummuscience commented 5 years ago

So, I ran the spare_mode version of the pipeline on 20 datasets using a tau of 0.7 and fs of 4. I just finished going through them manually. The general view is that it is doing a much better job at detecting different sized cells, which in my case also has physiological relevance.

There are two main issues I face. If you have a look at the image below, you will see an example of a datset where about 30-40% of cells are "lost".

Reason 1: merging ROIs. Usually two cells that are next to each other are merged into one. Other times it is multiple clusters of cells that form necklace like structures. I wonder how I could stop suite2p from merging those. I understand that the whole algorithm is activity based so if these two cells happen to respond to the same stimuli (like HighK, which activates everything) they will be merged...

Reason 2: ROIs with long dendrites. It happens quite often that the cell is correctly recognized, but then has a long tail of one or multiple dendrites attached to it. I have no way of telling if the dendrite is actually belonging to the cell. I don't care much about them, but it could be that it could mess with the signal extraction?

Maybe setting a maximum ROI size could help? What do you think I could try?

Are there any packages that use a shape-driven algorithm to detect cells (or a combination of both)?

screenshot 2019-03-07 15 33 14

marius10p commented 5 years ago

The tails don't hurt signal extraction, and the only reason it would be extracted with a cell is that they had very similar activity.

For an anatomical based algorithms, check out the section "Cell detection and signal extraction" from our review paper. This is the paper that comes to mind, but we haven't tried their code.

We will see what we can do about over-merged ROIs, but this is not a high priority problem for us, sorry, because in vivo spiking activity is far less synchronized than the type of slice preparation that you have.

hummuscience commented 5 years ago

First of all, congrats on your Science paper!!!! 💯

I have used suite2p in sparse-mode extensively lately and I have some comments/possible suggestions. I am just posting here because I don't want to open an issue for every suggestion.

carsen-stringer commented 5 years ago

Thanks! :)

1) we should probably to allow the user to set a diameter range in that case - we have discussed doing this and plan on doing this soon (this is an issue for other users as well but in the other direction). I will also think about color-coding overlapping ROIs, but not sure of the best way to do this. But hopefully modifications in cell extraction will mean this isn't required.

2) These are not "ghost" ROIs. When you click on the classifier view it automatically applies the classifier with the cell probability specified in the box (e.g. "0.5"). If this behavior is problematic I can change it so that it only applies when the user clicks ENTER in the cell probability box.

3) Yes I will definitely do this - this was because some people have red channels but many/most don't so there is no reason not to use the full color spectrum in most cases.

4) yes this is also a good idea. for now, a workaround is to open "Run suite2p" while the plane is loaded and then "Load ops file" will allow you to view the parameters (it will be in the same folder which will make it easier).

hummuscience commented 5 years ago
  1. we should probably to allow the user to set a diameter range in that case - we have discussed doing this and plan on doing this soon (this is an issue for other users as well but in the other direction). I will also think about color-coding overlapping ROIs, but not sure of the best way to do this. But hopefully modifications in cell extraction will mean this isn't required. My issue with a diameter Range is that my cells do have different diameters. The merged cells are sometimes as large as an actual different cell. The merged "blobs" are rarely larger than the largest possible cell in the culture. The only clear difference is that a single large cell is circular-ish while the merged blobs are usually grape-like.

Color coding overlap does not have to be by ROI. It could be done by pixel (so it is dynamic and not static). It is like doing a maximum projection of the pixels and then assigning a LUT on the range. 0 being no overlap at all and 5 being overlap of 5 pixels (coming from 5 overlapping ROIs). This way it would be clear if these is an area where there are many overlapping ROIs.

  1. These are not "ghost" ROIs. When you click on the classifier view it automatically applies the classifier with the cell probability specified in the box (e.g. "0.5"). If this behavior is problematic I can change it so that it only applies when the user clicks ENTER in the cell probability box.

They are not clickable at all. Is that intended behavior?

carsen-stringer commented 5 years ago
  1. I don't think I understand what you mean - right now I have variance dictating the value in the HSV map. If you could "NOT cell" all ROIs greater than / less than a certain size in the GUI, would this solve your problems?

  2. that is not good and not sure why that's happening - I can't reproduce it with a classifier I've made by adding to the built-in classifier and also a classifier built from a dataset iscell.npy.

carsen-stringer commented 5 years ago

I've implemented a way to view overlap. When an ROI is chosen, if there are other ROIs overlapping in that same view, those pixels are more grayish-black than pixels that don't have ROIs overlapping. Can you try increasing your ops['threshold_scaling'] parameter? This may get rid of some of your problematic ROIs.

I've also made the random view use the full HSV range.

hummuscience commented 5 years ago

I like the way you implemented overlap! One additional suggestion is to reintroduce the option to mark all the cells in the field of view. Right now, to see if cells overlap, I have to click the single ROIs.

In earlier versions of the GUI, one could draw a selection or select N cells (and input the number of all the cells). These buttons don't seem to work anymore.

Nice addition with the little numpad in the corner. It makes my normal way to curation much easier! Thank you :)

hummuscience commented 5 years ago

Another question. When the option allow_overlap is True, the extracted fluorescence signal is then calculated using all the pixels of the ROI (even the ones overlapping ones), right? When it is set to False, it is extracted from the non-overlapping pixels.

Is it possible to re-run the extraction after choosing cell/non-cell? I am asking, because in many cases, I have merges of cells, but if I combine the ROIs in a certain way, the non-overlapping pixels would be outlining the actual cells. This would allow me to get the fluorescence of a single cell, even though the ROI is that of a merged one.

carsen-stringer commented 5 years ago

I've actually added a new merge function to the GUI. It averages the traces of the 2 cells, so having allow_overlap=False would work well in that case. You can merge by selecting multiple cells (CTRL+click) and then pressing ALT+enter. To save merges go to the "Merge ROIs" tab at the top and click append merges to NPY. You can also have the GUI autosuggest merges.

You can also apply the classifier before signal extraction (to get rid of things) by setting ops['preclassify']=0.5 for instance, where 0.5 is the probability limit for the classifier.

hummuscience commented 5 years ago

Pressing Alt+enter does not do anything. There is also no "Merge ROIs" tab. Am I missing something? I am on the sparse-mode branch

carsen-stringer commented 5 years ago

This is on the master branch, we've merged in all the sparse mode changes but note that sparse_mode is not on by default (it's under cell detection parameters)

hummuscience commented 5 years ago

Just noticed that. Playing around with it now.

When I click on a cell, the view is centered and zoomed on it. is it intended behavior?

carsen-stringer commented 5 years ago

Yeah if you pull the latest version you should see a check box at the bottom to turn that on and off (zoom to cell). It automatically turns on when doing automated merge suggestions

hummuscience commented 5 years ago

Pressing Alt+enter after marking two overlapping cells produces this message in the terminal computing automated merge suggestions... but nothing seems to happens.

Pressing Alt+enter again gives this error:

Traceback (most recent call last):
  File "/home/muad/Downloads/suite2p/suite2p/gui/gui2p.py", line 291, in keyPressEvent
    merge.merge_cells(self)
AttributeError: module 'suite2p.gui.merge' has no attribute 'merge_cells'
carsen-stringer commented 5 years ago

oops I spoke too soon, try now. also note to save the cells, click append merges in the "merge" menu. They will be tacked on the end of the files. The original ROIs are not removed but will not be plotted so long as those merged ROIs exist. if you delete and resave then the original ROIs will be plotted again. each of the new merged ROIs has a field called "imerge" which tells you which ROIs are in the merged ROI. I'd also recommend trying out the automated suggestions.

carsen-stringer commented 4 years ago

I'm going to close this thread due to inactivity, I will add some documentation on the merging soon too