cortex-lab / KiloSort

GPU code for spike sorting
GNU General Public License v2.0
175 stars 100 forks source link

Double spikes: fast timescale subsequent spikes from separate units are templated as a single waveform and are correctly clustered if nt0 is drastically reduced #171

Open mnpompili opened 5 years ago

mnpompili commented 5 years ago

Hi,

We are testing kilosort on 256 channels, 9-hour long recordings in the mPFC and hippocampus of rats with twisted electrode bundles of 4-8 electrodes sampled at 20kHz. We are very happy with how fast kilosort is, compared to what we used before; however, we are still playing with some parameters to get to our final clustering recipe.

A striking results is the presence of clusters composed by doubles spikes like the one below. image In violet on the left you see 350 waveforms from a cluster formed only by two spikes following each other on a fast timescale. In blue you see the cluster formed by a single spike, which happens to coincide with the the first of the two spikes of the violet cluster (the waveforms don't look identical, but in many cases the same spike is detected as belonging to both the blue and the violet clusters). Spikes of this blue cluster occur much more often (2.14 Hz) than spikes of the violet cluster, representing the combination of the two (0.01 Hz). Let's imagine the the violet cluster represents a true sequence of neurons A and B. In this case, this sequence (A, then B) occurred more than 350 times in 9 hours. Neuron B alone was not clustered by kilosort with these parameters, as none of the other clusters' spikes coincided with it. An example of the waveform in question (A followed by B, clustered in the violet cluster) can be found below. In violet we are highlighting the 16 samples preceding the waveform peak and the 32 following it. This very same spike was also put in the cluster of neuron A alone. image.

In our dataset, this happens often - we have around 1 such cluster for every given electrode bundle, on different days in recording of different animals. This issue seems to be caused by how many samples kilosort uses to do the template matching. To our understanding, this is set by the parameter nt0 and its default value is 61 (if we understand the code correctly, this means that each template is composed by 61 samples before and 61 samples after the peak, an overall of 6.1 ms), which seems strikingly high to use when in our previous plactice with KlustaKwik we have used 8-10 samples in total (0.4-0.6 ms) for the PCA step. Please note that the examples above were obtained with nt0=40.

We therefore tried to reduce this value to half: nt0=20. We found the same example above and neurons A and B were now clustered separately with no double-spike cluster. Below, you can see how the same two spikes shown above now belong to normal single-spike clusters.

image image

Note that these neurons seem to be connected monosynaptically, which could be why they fire in sequence often enough to have resulted in Kilosort detecting double spikes as a single event when using a very wide window.

Sadly, nt0=20 does not get rid of all double spikes as you can see by the example below of two neurons spiking one after the other on a shorter timescale compared to the example above. image

Using, nt0=5 (an overall of 0.5 ms if our initial understanding of the nt0 parameter was correct), we no longer detect this double-spike cluster.

We are at a bit of a loss as to why the nt0 value is set so high by default and perhaps we are missing something. It seems to us that unless the recording is of a brain area with very sparse spiking activity or the sampling rate is considerably higher than our 20 kHz, double-spike templates should be abundant when two neurons spike one after the other (like the two green cells above). Indeed, in our study, we are interested in cells that fire together or in sequence. Of course, this problems might be less striking with shorter recordings, where perhaps double spiking might not happen often enough to create the double cluster.

We are planning to keep the nt0 value at 5 or even 4 samples for 20 kHz recordings. This should capture the spike trough well and it seems reasonable that spikes should be templated onto this rather than onto the signal before or after the peak, where there could be other spikes happening.

However, our supposedly reasonable assumption deviates so much from the default value that we are afraid that we may be misunderstanding something basic of how the algorithm works. Why was the nt0 value set so high by default, and does decreasing it so dramatically pose any risks?

Thanks in advance for the help, Marco

marius10p commented 5 years ago

Hi Marco,

Seems like I missed this. I apologize. nt0 is the total duration of the spike in samples, and nt0min is the sample at which the minimum is aligned. For your 20khz recordings, nt0=20 might already be small enough to create other problems. If the template is too small for a neuron, then after subtraction you might still have a big chunk of the neuron's waveform left, and another template might get created specifically to explain that leftover chunk.

We are hoping to release Kilosort2 as soon as possible. I don't know if you'll still have that problem there, but several things work differently, so we'll see.

Best, Marius

mnpompili commented 5 years ago

Hi Marius,

thanks for the feedback. We look forward for Kilosort2 then.

Best, Marco.