cortex-lab / neuropixels

Information about Neuropixels electrode arrays
83 stars 15 forks source link

Spike sorting algorithm of choice #19

Open ajuavinett opened 6 years ago

ajuavinett commented 6 years ago

Hey all -- I'm assuming most of you have been using Kilosort/Phy to sort your Neuropixels data so far, but has anyone compared their output with JRCLUST yet? We're going to try it with a data set this week, so I was curious if others have also given it a shot with their data.

https://www.biorxiv.org/content/early/2017/01/30/101030 https://github.com/JaneliaSciComp/JRCLUST/wiki

Best, Ashley

weallen commented 6 years ago

I've been using Kilosort + Phy but have recently tried JRCLUST as well. In general, I think the manual cleanup GUI for JRCLUST is much better than Phy -- it actually shows physical units, and I prefer the layout of how you go through units.. JRCLUST also seems to be more resistant to drift -- the majority of high firing rate clusters with Kilosort tend to be split (sometimes into multiple clusters), whereas I rarely found that with JRCLUST. Kilosort, on the other hand, seems to be better at picking out units with relatively few spikes. That said, I've found a disturbing lack of correspondence between clusters from the two algorithms, particularly for low amplitude units.

Also, weirdly, the amplitudes that JRCLUST gives are almost exactly 2.5x what I get from computing the amplitudes from the Kilosort templates -- even in situations where I can clearly see that it's the same unit the two algorithms are finding. The SNR looks similar between the two so I assume this is a bug in one or the other, and the JRCLUST amplitudes look more similar to what I see by eye during recording...

Let me know what you think. Hopefully at some point all of this will be combined into one standard method for the field...

nsteinme commented 6 years ago

I don't want to make this more complicated for you, but there are many new algorithms that various people think are the best thing since sliced bread:

Lee J, Carlson D, Shokri H, Yao W, Goetz G, Hagen E, Batty E, Chichilnisky E, Einevoll G, Paninski L: YASS: Yet Another Spike Sorter. bioRxiv 2017, doi:10.1101/151928.

Chung JE, Magland JF, Barnett AH, Tolosa VM, Tooker AC, Lee KY, Shah KG, Felix SH, Frank LM, Greengard LF: A Fully Automated Approach to Spike Sorting. Neuron 2017, 95:1381–1394.e6.

Yger P, Spampinato GL, Esposito E, Lefebvre B, Deny S, Gardella C, Stimberg M, Jetter F, Zeck G, Picaud S, et al.: A spike sorting toolbox for up to thousands of electrodes validated with ground truth recordings in vitro and in vivo. Elife 2018, 7:e34518.

Dhawale AK, Poddar R, Wolff SB, Normand VA, Kopelowitz E, Ölveczky BP: Automated long-term recording and analysis of neural activity in behaving animals. Elife 2017, 6:1–40.

Hilgen G, Sorbaro M, Pirmoradian S, Muthmann JO, Kepiro IE, Ullo S, Ramirez CJ, Puente Encinas A, Maccione A, Berdondini L, et al.: Unsupervised Spike Sorting for Large-Scale, High-Density Multielectrode Arrays. Cell Rep 2017, 18:2521–2532.

Maybe one of them is the best things since sliced bread!! It's getting out of control and I don't know what to tell you. The best place to start is to have a clear idea of what characteristics you want and what measure you want to use to decide what's best. If you do any comparison, please do let us know what you find (thanks Will for sharing observations! very helpful to hear. would also be helpful if you make issues on phy-contrib github describing what gui features you think phy is lacking?).

Will: how do you calculate amplitudes from kilosort? The numerical values of the template waveforms are not going to match the numerical values of the spikes in the recording both because they are in a whitened space and because each spike is fit by the template multiplied by the amplitude from amplitudes.npy. I recommend extracting the spikes from the raw data to compute a mean waveform. In that case, you can't get a different answer between the two algorithms because your calculation used only the spike times and the original raw data.

On Thu, Apr 12, 2018 at 7:42 PM, William E. Allen notifications@github.com wrote:

I've been using Kilosort + Phy but have recently tried JRCLUST as well. In general, I think the manual cleanup GUI for JRCLUST is much better than Phy -- it actually shows physical units, and the layout of how you go through units. JRCLUST also seems to be more resistant to drift -- the majority of high firing rate clusters with Kilosort tend to be split (sometimes into multiple clusters), whereas I rarely found that with JRCLUST. Kilosort, on the other hand, seems to be better at picking out units with relatively few spikes. That said, I've found a disturbing lack of correspondence between clusters from the two algorithms, particularly for low amplitude units.

Also, weirdly, the amplitudes that JRCLUST gives are almost exactly 2.5x what I get from computing the amplitudes from the Kilosort templates -- even in situations where I can clearly see that it's the same unit the two algorithms are finding. The SNR looks similar between the two so I assume this is a bug in one or the other, and the JRCLUST amplitudes look more similar to what I see by eye during recording...

Let me know what you think.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cortex-lab/neuropixels/issues/19#issuecomment-380905499, or mute the thread https://github.com/notifications/unsubscribe-auth/AHPUP7vYKcM4tQuVPD5xh06vjRKlrSfpks5tn6AmgaJpZM4TSSc0 .

weallen commented 6 years ago

I guess the other thing that I've noticed is that Phy and Kilosort no longer seem to be under active development, whereas JRCLUST is -- is this true Nick? I've suggested things there, as have other people, but I haven't seen any changes.

Good to know about the amplitudes -- I meant using your MATLAB function from spikes. I do use the raw spikes to calculate amplitude for computing quality control metrics.

nsteinme commented 6 years ago

Ah. Well if it is off by a factor of 2.34, that is because the gain in spikeglx-generated files is 2.34µV per bit with default gain settings and my function doesn't know this number by default (since it depends on the gain setting you choose - see here - https://github.com/cortex-lab/neuropixels/wiki/Gain_settings).

Cyrille (phy) and Marius (kilosort) are both still alive and very much interested in tools for spike sorting. Cyrille is mostly now working on database stuff for IBL and Marius is mostly now working on all sorts of crazy science. So it's not that the projects are abandoned, that's for sure, but they are at the stage of roughly "finished products" - I believe both Cyrille and Marius are under the impression that there are no major flaws in their software and that their tools are still the best available, claims against which I have yet seen no compelling evidence (though I have not tried other guis personally). On the contrary we felt there was solid evidence that kilosort was doing much better than alternatives, though our tests (http://phy.cortexlab.net/data/sortingComparison/) are getting quite out of date now and it may well be the case that competing algorithms have improved. Also, drift correction in kilosort is now progressing due to some newly available data, and I anticipate there will be an update to kilosort in the next months that introduces some version of this feature, if it can work for some probes (remains to be seen).

On Thu, Apr 12, 2018 at 10:20 PM, William E. Allen <notifications@github.com

wrote:

I guess the other thing that I've noticed is that Phy and Kilosort no longer seem to be under active development, whereas JRCLUST -- is this true Nick? I've suggested things there, as have other people, but I haven't seen any changes.

Good to know about the amplitudes -- I meant using your MATLAB function from spikes. I do use the raw spikes to calculate amplitude for computing quality control metrics.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cortex-lab/neuropixels/issues/19#issuecomment-380948144, or mute the thread https://github.com/notifications/unsubscribe-auth/AHPUP63in_d016QzkB0abn0imkRRLtkRks5tn8UtgaJpZM4TSSc0 .

ajuavinett commented 6 years ago

Yikes, thanks @nsteinme. What I want is the closest thing to ground truth without spending weeks sorting one experiment, hah. Oh, and drift correction, because we're recording in freely moving mice.

weallen commented 6 years ago

@nsteinme Hmm, no I take that into account. In any case, it's not a big deal because I compute it correctly from the raw data anyways.

@ajuavinett Does it really take weeks to sort a single experiment with kilosort? With a little bit of automation, I feel like I can manually clean up an experiment in ~1-2 hours. (Maybe I'm less careful than you all...)

ajuavinett commented 6 years ago

Ha my mistake, @weallen I didn't mean to imply that. :) I just meant that manual curation time is a factor. For kilosort, I agree, it's about 1-2 hours per experiment!

weallen commented 6 years ago

Good to hear :-) We actually built a neural network system that can discriminate noise vs real neurons with ~99% accuracy (on held out data from a set of ~15 sorted experiments), which speeds things up quite a lot. Working on using a similar system to automate some of the easy merges now.

rossant commented 6 years ago

Cyrille (phy) and Marius (kilosort) are both still alive and very much interested in tools for spike sorting. Cyrille is mostly now working on database stuff for IBL and Marius is mostly now working on all sorts of crazy science. So it's not that the projects are abandoned, that's for sure, but they are at the stage of roughly "finished products" - I believe both Cyrille and Marius are under the impression that there are no major flaws in their software and that their tools are still the best available, claims against which I have yet seen no compelling evidence (though I have not tried other guis personally)

There are many many things to improve in phy but, as Nick said, I'm currently busy with IBL database stuff. However I should have the opportunity to get back to phy in a few weeks.