Double-counted spikes - Githubissues

jsiegle commented 5 years ago

It seems like there's a tradeoff between Kilosort's spike threshold and the frequency of putative double-counted spikes. Presumably, this results from spike residuals getting fit by the same template or a different template on the same set of channels. So, the lower the threshold, the higher the likelihood that a residual will get counted as a different spike. Here's an example of a pair of units that has a lots of overlapping spikes at zero time lag, which is unlikely to be physiological:

double-counting_example_sort1_574_326

The good news is that the latest version of Kilosort is performing much better than previously when it comes to double-counting. Even with a lower threshold (ops.Th = [10 4] vs. [12 12] previously), the percentage of units with large zero-lag ACG peaks drops from 10% to 3%. But because false-positive zero-lag synchrony is so dangerous, it would be advantageous to reduce double-counting even further.

A few questions:

Do other people see this as well? The easiest way to visualize it in phy is by selecting a block of units from nearby channels and looking for pairs with abnormally large peaks at 0.
Are there any parameters besides ops.Th that would help reduce double counting? I tried changing ops.lam to 40, but it didn't have much of an effect.
Does double-counting always happen on the last pass, after the residuals are subtracted? Or can the same spike be fit by multiple templates during the main sort?
Would it be possible to flag spikes with highly overlapping templates that are detected within +/-2 samples? Nick suggested notifying the user about undetected spikes in #3, and I think adding information about potential double-counted spikes could also be helpful.

marius10p commented 5 years ago

Thanks Josh. Is the issue of zero-lag ACG peaks related to the issue of zero-lag CCG peaks?

jsiegle commented 5 years ago

I think so...they both occur more often when the threshold is lowered. But I am most worried about zero-lag CCG peaks, because we do expect there to be some perfectly overlapping spikes, just not as many as we're seeing in certain pairs.

AurelieP23 commented 5 years ago

I may be seeing this as well, though not very often. It seems to correlate with a smaller ops.lam (in this example ops.lam = 0, ops.Th = [10 2]).

Ex_doublecountedAPs

The templates for the two clusters are very different ( blue one was manually merged, hence the blue and grey traces).

Ex_doublecountedAPs_templates

Any thoughts about the role of ops.lam and ops.Th in this issue?

AurelieP23 commented 5 years ago

It seems like increasing ops.Th(1) may help... from #17 (example with blue, red and yellow clusters)

marius10p commented 5 years ago

@AurelieP23 I would not be too worried about your example... the red cluster should be discarded anyway, since it only has 159 spikes. You are right that increasing Th(1) would reduce this problem, but you might lose other perfectly good units in the process. This situation should happen only for the very large spikes, when there is a residual after template subtraction.

Are you sure the blue one should have been merged? Looks like pretty large refractory violation bin at 0.

chris-angeloni commented 5 years ago

I have also seen this with my data, and that makes sense that it is just refitting the residual, as I usually see it with very high amplitude units, as seen in the screen shot below. I'll note that this clustering was run with the default parameters, where threshold is [10 4] and lam is 10.

kalfton commented 5 years ago

I encountered this problem as well, and I will show this problem in another prospective. We recorded neurons by electrode array, and each electrode channel is far from others, so there should not be any neuron appears in multiple channels.

I use kilosort to detect spikes from the channels, and then import the result to Plexon's offline sorter to check. (ops.lam = 10; ops.Th = [10 4]; )

The picture below shows one of the channels in offline sorter. There are 3 units here(yellow , green and blue). And we found that a large number of the spikes in the blue unit is just a small shift of that in the yellow unit, and I will explain it below:

The 3 subplots on the top row are: Raw spike shapes, ACG and CCG of units, and PCA result. The CCG of yellow and green units has a large peak at zero lag. (even for yellow and blue units as well)

The row in the middle shows the units' average shapes and histograms of Inter-spike intervals

The subplot in the lowest row: It plots spikes versus time. Here as you can see, a spike from the green unit overlaps with a spike from the yellow unit. Also another spike from the yellow unit overlaps with a unsorted(mua) spike.

plexon

I found this double counting problem happened in about 10% of the channels. These double counted spikes are very dangerous for us, and is not easy to discover and remove them in the phy GUI. So it would be very beneficial to find a way to get rid of it.

susuchen66 commented 5 years ago

Hi we are also seeing such cases which show up as high peak at near zero of cross correlogram between two clusters, which presumably is a result of a second template fit onto residual after the first template subtraction. This is obvious in about 5% of clusters, but could be more prevalent in cases that are harder to detect.

We realised this both when using JRClust GUI and Phy during manual curation.

ks2output15 templates1 templates2

susuchen66 commented 5 years ago

In addition, we encounter large fraction of double counted spikes within the same cluster, and they are slightly temporally shifted, so the autocorrelogram shows high peak near zero.

High amplitude unit example visualised in JRClust GUI:

Same unit in Phy. Note that Phy does not show the peak at near zero!! (this issue has been reported to Cyrille on Phy Github page):

susuchen66 commented 5 years ago

Here is a summary of ISI violation % when setting a threshold of 0.5 ms, 1 ms, or 2 ms for 180 manually curated units. a quick way we can pick out clusters with likely double counting by comparing the number of ISI < 0.5 and ISI < 2.0

DJ-Hayden commented 5 years ago

Could this simply be fixed by searching through every channel and flagging spikes that occur (in separate clusters) within a millisecond of each other?

thomaszhihaoluo commented 5 years ago

Hi! Has anyone has arrived at a way to address this issue and would be willing to let me know? I would really appreciate it. I encounter a similar problem with my datasets. Thanks!

NoahDolev commented 4 years ago

We have similar issues sorting retinal spikes on MEAs. In addition to double counted spikes, we also have spikes where only a portion of them is detected as part of the waveform.

Can you point us to parameters that would be likely to affect this? Is anyone willing to post their ops settings for retina?

bapungiri commented 4 years ago

Another example of this issue. Does anyone have a solution to this?

nsteinme commented 4 years ago

In this last example from @bapungiri, the red unit is one with a much smaller template (can see this in the amplitude column value), where the red template is erroneously found on the residuals of the blue. Since the red one has very low amplitude (judging by the few red spikes that don't look like the blue), a simple amplitude threshold on the template would solve this, i.e. you ought to call the red one noise, and an amplitude threshold would achieve that. Are there examples where neither of the templates are very small? I doubt it, but please show if there are. (P.S. thanks for the full phy screenshot, very helpful).

bapungiri commented 4 years ago

You are correct, these are mostly restricted to high amplitude spikes. The red cluster shown above was previously part of another cluster (53 in the above example) which had a smaller amplitude. I guess after subtracting the high amplitude template, the residuals were similar to template of smaller waveform (cluster 53). Just as an example, I re-merged the red cluster to cluster number 53 to show two waveforms after which i decided to separate them.

DJ-Hayden commented 4 years ago

Hello,

I don't think this is just a template-subtraction issue.

I've attached two images. The first is of unit 14 alone. Check out TraceView, where 3 spikes in a row are clearly visible (and likely the same unit), but only the first is included. Then check out unit 14 and 314. The first spike is a part of BOTH unit 14 and unit 314.

This is...troubling to say the least. Are there any plans to make sure the algorithm isn't double-counting spikes?

Unit 14: unit14_only

Unit 14 and 314: unit14_and_314

marius10p commented 4 years ago

It's a very small proportion of all spikes that get double counted. Remove them yourself in post-processing if you're still worried about them, but right now this is the price to pay for the template matching approach.

MouseLand / Kilosort

Double-counted spikes #29