SpikeInterface / spikeinterface

A Python-based module for creating flexible and robust spike sorting pipelines.
https://spikeinterface.readthedocs.io
MIT License
495 stars 188 forks source link

Peak misalignment #2614

Open JuanPimientoCaicedo opened 5 months ago

JuanPimientoCaicedo commented 5 months ago

Hello,

I am finding that in a few instances, some units sorted in kilosort 4 show a peak misalignment that causes the software to split them. Here is an example of that.

image

the templates are pretty similar and to me it is clear that these spikes are probably part of the same cluster:

image

My question here is, does the SortingAnalyzer object consider these types of errors when computing the waveform averages or not? Reading the documentation, I noted that you do perform a peak alignment, but this sample shift is applied to the whole spike train and not to individual traces.

Thank you for your time.

samuelgarcia commented 5 months ago

Hi, no SortingAnalyzer is agnostic of this alignement. Th ewaveforms/templates are computed at spike times. We have a get_template_extremum_channel_peak_shift() and align_sorting() to find this shift and make the alignement units per units.

alejoe91 commented 5 months ago

Sam read the full message! :) In the same cluster there are both shifted versions of the waveforms!

Unfortunately, in this case, there is nothing we can do on the SpikeInterface side. You would need to shift the individual spikes, but we currently don't have that funcionality

zm711 commented 5 months ago

Based on the amplitude difference and the fact that KS (1-3) had a tendency to split the same unit into two separate clusters if the residual is big enough couldn't it also be that one of those clusters is just the residual of the "real cluster". Not sure for KS4, but I remember there being a thread on KS2 or 2.5 discussing that it can happen with KS and that their recommendation was delete the smaller cluster. The curation.remove_duplicate_unit (if the units are separate) or curation.remove_duplicate_spikes (if they've already been merged into one cluster) should help deal with that no? Or is that not a possibility @alejoe91 @samuelgarcia ?

JuanPimientoCaicedo commented 5 months ago

@zm711, That is something that could cause some classification errors, and to me, it looks like kilosort 4 keeps having to some degree those mistakes. I have seen some duplicated clusters and when I find them what I do is to delete the one with more shift and less total spikes.

However, I believe this case is kind of different. And the main reason I think that is because of the autocorrelogram:

  1. When a unit is duplicated, you don't have a 0 lag dip, you instead have a huge autocorrelation at 0 lag. which is evidence that the same spike was counted two times.
  2. The case I am showing here is a situation where the cross-correlogram between the two units showed a dip, that led me to think that even when the principal components and the amplitude distribution showed a clear division between these two units, that was mainly caused because there where two shifted versions of the waveform.
  3. Of course I can be wrong, but I am not convinced that eliminating one of the two clusters is the solution to the problem.

regarding the usage of the curation.remove_duplicate_unit, and curation.remove_duplicate_spikes functions. I already use them as a precuration step in my pipeline.

With this I am not saying that Kilosort 4 is a bad algorithm, I am actually getting good units with that approach. This is an event that can happen 2 or 3 times per session, but I do not see a reason to delete these clusters when to me they are high-quality units.

zm711 commented 5 months ago

Again these are my takes, but I'm curious what the others think too!

When a unit is duplicated, you don't have a 0 lag dip, you instead have a huge autocorrelation at 0 lag. which is evidence that the same spike was counted two times.

This is a strong point. The residual of a spike should be within a few samples so not necessarily a 0 lag, but relatively close because they will still have different spike times (ie real spike at sample 10 and residual at sample 14, not both at sample 10). So I would expect a peak relatively close to a 0 lag. I guess it depends on the bin size you're using (or Phy in this case whether you would see it).

even when the principal components and the amplitude distribution showed a clear division between these two units, that was mainly caused because there where two shifted versions of the waveform.

this is the point I'm not sure about. The spike time (based on my understanding of KS) is determined based on the peak of the spike and not the other way around. I would say that your waveforms are shifted because they have different amplitudes and not that they have different amplitudes because they are shifted. So in this case maybe you could make an argument for some sort of step drift, but that doesn't fit with the fact that you have spikes at both amplitudes all across your recording.

Of course I can be wrong, but I am not convinced that eliminating one of the two clusters is the solution to the problem.

Agreed I'm more than happy to be wrong. I think these cases are super tricky to decide on because like you said you don't want to toss data, but then you need to make the decision of real or artifact.