MouseLand / Kilosort

Fast spike sorting with drift correction for up to a thousand channels
https://kilosort.readthedocs.io/en/latest/
GNU General Public License v3.0
469 stars 245 forks source link

Complex spikes not well detected #286

Closed DradeAW closed 8 months ago

DradeAW commented 3 years ago

Hi,

I am doing extracellular recordings of mice cerebellar cortex, and soon transitioning to Neuropixels, so I thought I'd give Kilosort a try (previously using MountainSort).

I am particularly interested in Purkinje cells, which display 2 types of spikes: simple spikes (which are pretty much normal spikes) at a rate of 50-100 Hz and complex spikes (see image below with an asterisk) at a rate of ~1 Hz.

simple_vs_complex_spikes

I have to say, I am very pleased with Kilosort's detection of simple spikes, which seems better than MountainSort. However, Kilosort seems to have troubles finding these complex spikes (I don't know if it's because of their low frequency during PCA, or because of their shape in time (spikes are only extracted on 3ms in phy, but some complex spike last 4ms)).

I tried using the default parameters, tried to lower the high-pass filter to 150 Hz, tried to lower the Threshold from -6 to -4, but I'm still unable to correctly sort them (Kilosort successfully sorts 10-20% of complex spikes that MountainSort finds).

I was wondering if you could think of what parameters to adjust in order to extract those complex spikes. In MountainSort, I usually do an analysis of bandwith 600-6000 (for simple spikes) and one with bandwith 150-1500 (for complex spikes) and then merge the 2 analysis, but it isn't enough for Kilosort.

Thank you!

PS: I am using the python version of Kilosort (pykilosort). I know it isn't fully finished yet, but I'm guessing this isn't the cause of my problem.

m-beau commented 3 years ago

Hi,

I am interested in your comparison of kilosort and mountain sort to find complex spikes.

I know that kilosort 2 is successful in finding complex spikes, but I must confess that I have not tried to compare it to another algorithm or any kind of ground truth - what I can tell you is that the units I get have the right mean firing rate, so I presume that I do not miss too many spikes. We can chat over zoom if you want, and I could show you a few examples.

Maxime Beau

DradeAW commented 3 years ago

Yeah we can definitely talk about it!

Contact me via email: aurelien.wyngaard@ens.fr

m-beau commented 3 years ago

Haha un Médecine-Science!

DradeAW commented 3 years ago

I think I found why complex spikes are not well detected:

Screenshot from 2020-12-01 11-30-38

In the phy GUI, in blue we have the detected simple spike, and in red a fat spike (which is a complex spike detected in the dendrites). I know those units come from the same cell as there is a pause in simple spike after a fat spike.

In the right window, I extracted the fat spike unit and plotted the mean waveform on channels 43 to 63 (channel 53 to 54 is a different shank, as seen in phy). As you can see the complex spike is recorded by the electrode, but I believe kilosort fused them with the simple spike unit as they have almost the same negative phase, but differ in the positive phase afterwards. In the cross correlogram, we can see there are a lot of simple spikes hapening at the same time as the fat spike, which supports my claim.

Is there a way to force Kilosort to look for the whole waveform (sometimes lasting for mere than 4ms)? If I create a complex spike template, can I force it so that Kilosort tries to find them?

m-beau commented 3 years ago

Hi Aurélien,

The phenomenon that you observe - 0-lag crosscorrelation between the 2 clusters - is not exactly due to complex spikes being sorted as simple spikes (the spikes would be either in one cluster or the other, not in both, so there wouldn't be this huge 0-lag correlation) - it is due to complex (or simple) spikes being sorted twice, once in your complex spikes cluster and once in your simple spikes cluster. This happens because kilosort subtracts the template from the raw data when it finds a spike, which can be unperfect in some cases. This subtraction leads to a residual which can be sorted a second time, by mistake, and you end up with spike duplicates. You can read more about this issue here.

The case that you describe above could then have two origins: either some complex spikes residuals are sorted as simple spikes, or simple spikes residuals are sorted as complex spikes. I suspect the former: that dendritic fat spikes are found (and become your complex spike cluster), subtracted, and its residuals (which maybe still contain the somatic spike if it hasn't been subtracted, you could tell from the template) are sorted with simple spikes.

A simple solution is to remove spike duplicates manually post-hoc after having matched your simple and complex spikes! But to do that you need to know if you need to remove them from you simple spike or your complex spike cluster (I suspect that they should be removed from your simple spike cluster as discussed above).

To help us to figure out which cluster is polluted with false positives, it would be great if you could put together a side by side comparison of mountain sort - which doesn't use template subtraction and would not find residuals - and kilosort outputs ran on the same dataset. I expect that the complex spike clusters will have roughly the same amount of spikes, and that the simple spike cluster will have more spikes in kilosort than in mountain sort (and they would happen at the time of complex spikes).

DradeAW commented 3 years ago

The problem is that in this recording, there are other complex spikes that MountainSort finds, but Kilosort doesn't.

I strongly believe that if I remove the shank where there are fat spikes, that Kilosort wouldn't find the complex spike (it seems to find fat spikes relatively well however).

Furthermore, it isn't unusual to see spikelets in the auto-correlogram of some simple spikes, but still no complex spike sorted by Kilosort, where MountainSort finds it.

DradeAW commented 3 years ago

I have multiple recordings with simple spikes but no complex spikes. In the "FeatureView" (PCA), I can sometimes see a cluster, and if I extract it, I have the complex spikes (although noisy).

CS_extract

So I'm pretty sure Kilosort merges them together, but I don't know if it's a bug of the python version, or a problem with kilosort algorithm. Nevertheless, is there a way to modify the template window? I'm guessing the default is 3ms, but could it be an option to change it?

m-beau commented 3 years ago

The problem is that in this recording, there are other complex spikes that MountainSort finds, but Kilosort doesn't.

You just showed a phy screenshot of a complex spike being found by kilosort (the one with a strong 0-lag correlation). In this case you must be able to compare mountain sort and kilosort output presumably (i.e. figure out which cluster is polluted with false positives), you need to do it so that we can progress in this discussion.

In the "FeatureView" (PCA), I can sometimes see a cluster, and if I extract it, I have the complex spikes (although noisy).

The blue cluster that you show here is a subgroup of simple spikes, not a complex spikes cluster (maybe polluted with a few, but if these were complex spikes the CCG wouldn't be as symmetrical, in particular it wouldn't start going down before 0ms).

Nevertheless, is there a way to modify the template window? I'm guessing the default is 3ms, but could it be an option to change it?

It is set here : https://github.com/MouseLand/Kilosort/blob/c31df11de9a4235c22a20909884f467c3813a2e4/preProcess/preprocessDataSub.m#L12

@marius10p it seem like it is forbidden to use a wider template than 81 'due to GPU shared memory', any comment on this?

DradeAW commented 3 years ago

In this case you must be able to compare mountain sort and kilosort output presumably (i.e. figure out which cluster is polluted with false positives), you need to do it so that we can progress in this discussion.

I haven't made an in-depthh analysis, because most complex spikes found by both have the same frequency (+-0.01 Hz). My problem is with complex spikes that have a small positive phase, thus that look like simple spikes. Kilosort doesn't seem, at least in the python version, to extract the complex spike very well.

The blue cluster that you show here is a subgroup of simple spikes, not a complex spikes cluster (maybe polluted with a few, but if these were complex spikes the CCG wouldn't be as symmetrical, in particular it wouldn't start going down before 0ms).

I agree, some of them are probably spikelets. But if you look at the waveform (particularly channel 12), you can see a slight positive phase, which makes me think that some of them are indeed complex spikes, polluted with spikelets and probably something else.

Thank you for the template window! For some reason I didn't catch it ^^'