cortex-lab / KiloSort

GPU code for spike sorting
GNU General Public License v2.0
172 stars 100 forks source link

ops.splitT and ops.mergeT and other parameters #68

Open petersenpeter opened 7 years ago

petersenpeter commented 7 years ago

Hey

I am recording from the hippocampus and using 5-6 shank 64 channel Buzsaki staggered probes. After running Kilosort I end up with many units that likely consists of multiple neurons. I played around with the ops.Nfilt option to improve the output, but even at 6 times the channel count, I still have many units that are merged from what appears to be multiple units (yet I end up with many units with a single spike in them).

Could you provide more information on other relevant parameters in this context? E.g. ops.mergeT and ops.splitT?

Best Peter Petersen

marius10p commented 7 years ago

1) There is no way to recluster a subset of units. @rossant can pitch in, whether he thinks running a fast clustering algorithm underneath the hood in Phy could be feasible. To clarify, you are interested in the situation where a template is a merge of multiple units, and you want to generate the best split for that template? I can consider that as a step directly inside Kilosort, it's an interesting idea.

2) Try ops.initialize = 'fromData', in the config file. This will run a more complicated initialization procedure, that can help avoid local minima. If you are left with many clusters with 1 spike, it means you can shift the balance between splitting and merging, by increasing ops.splitT to increasing splitting. You can also try decreasing ops.mergeT, if your problems happens due to this step unnecessarily merging templates (though it would be hard to know if this was the problem). Note also that at some point, for small amplitude units, resolving single units will be impossible, since they will be merged into the background spikes.

rossant commented 7 years ago

Extending the GUI to implement a "recluster" feature should certainly be possible. The question is what clustering algorithm would be sensible? I'm not sure it would make sense to run klustakwik.

marius10p commented 7 years ago

If it's a matter of just splitting a single cluster into two, that can certainly be implemented in a fast way, using a simpler algorithm without the bells and whistles of klustakwik.

nsteinme commented 7 years ago

Here is one example way "recluster" could be implemented:

1) The user draws a lasso around their guess at the cluster, but presses R instead of K. Call points within the lasso A and those outside B. 2) Using all available PC dimensions, the fisher linear discriminant between A and B is computed. 3) A criterion is selected along this dimension where the proportion of "misses" (user labeled A but given this criterion they are called B) equals the proportion of false alarms (given this criterion they are A but user didn't have them lasso'ed). Split is made at that point.

If there's an automatic way to get a good starting point for #1 then even better. Perhaps just k-means with two clusters would be just as well here, though this method has the advantage of not assuming gaussian shape.

Nick

On Wed, Jun 7, 2017 at 10:22 AM, Marius Pachitariu <notifications@github.com

wrote:

If it's a matter of just splitting a single cluster into two, that can certainly be implemented in a fast way, using a simpler algorithm without the bells and whistles of klustakwik.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cortex-lab/KiloSort/issues/68#issuecomment-306740237, or mute the thread https://github.com/notifications/unsubscribe-auth/AHPUP0OLv9kIyMMHe_BYxAs6FDQ_jXdeks5sBmvUgaJpZM4Ncn22 .

petersenpeter commented 7 years ago

Hi guys

Regarding reclustering, I have attached a screenshot from Phy to help explain the issue. All the selected units have been cleaned, are from the same shank, and have fairly good autocorrelations and waveforms. Using the correlation matrix, I am considering which units that should be merged. A subset of these units (three thereof are shown in second screenshot) don't have clean cross correlations with other units, which likely indicates that they should be merged or that they are contaminated from a shared unit. From their firing rate one might say that merging them is not the solution (as they all appear to be pyramidal cells and they rarely fires this fast) and that they likely are co-contaminated (like the three units highlighted in the second screenshot, which also don't have clean autocorrelations). Yet from the FeatureView there is no obvious contamination or splitting point. I believe that a recluster feature would be very helpful - an automatic way to forcefully split only a subset of units, which appears to be contaminated/underclustered units.

I don't know if a simple K-mean clustering would be sufficient for this. Regarding Nicks idea, a way to get a starting point could be to look for density-peaks in the PCA space - A multi-modality test perhaps?

Marius: Thanks for the suggestions, I tried to change ops.initialize = 'fromData'. yet I received the error below. I will try to adjust the mergeT/splitT settings and see if that will improve the spike sorting. What is the sensitive range of both mergeT and splitT?


Time 769.57, batch 601/656, NTOT 0 Index exceeds matrix dimensions.

Error in fullMPMU (line 205) [~, isort] = sort(st3(:,1), 'ascend');


Best, Peter

kilosortphyreclustering kilosortphyreclustering2

nikolaskaralis commented 7 years ago

The option to re-run PCA on a given cluster would be also helpful, since after merging/splitting, the previously computed components are less informative.

On Mon, Jun 12, 2017 at 3:34 PM, Peter C Petersen notifications@github.com wrote:

Hi guys

Regarding reclustering, I have attached a screenshot from Phy to help explain the issue. All the selected units have been cleaned, are from the same shank, and have fairly good autocorrelations and waveforms. Using the correlation matrix, I am considering which units that should be merged. A subset of these units (three thereof are shown in second screenshot) don't have clean cross correlations with other units, which likely indicates that they should be merged or that they are contaminated from a shared unit. From their firing rate one might say that merging them is not the solution (as they all appear to be pyramidal cells and they rarely fires this fast) and that they likely are co-contaminated (like the three units highlighted in the second screenshot, which also don't have clean autocorrelations). Yet from the FeatureView there is no obvious contamination or splitting point. I believe that a recluster feature would be very helpful - an automatic way to forcefully split only a subset of units, which appears to be contaminated/underclustered units.

I don't know if a simple K-mean clustering would be sufficient for this. Regarding Nicks idea, a way to get a starting point could be to look for density-peaks in the PCA space - A multi-modal test perhaps? Marius: Thanks for the suggestions, I tried to change ops.initialize = 'fromData'. yet I received the error below. I will try to adjust the mergeT/splitT settings and see if that will improve the spike sorting. What is the sensitive range of both mergeT and splitT?

Time 769.57, batch 601/656, NTOT 0 Index exceeds matrix dimensions. Error in fullMPMU (line 205) [~, isort] = sort(st3(:,1), 'ascend');

Best, Peter

[image: kilosortphyreclustering] https://user-images.githubusercontent.com/14089401/27036268-4f9fb184-4f52-11e7-9210-ce5ae48e793d.png [image: kilosortphyreclustering2] https://user-images.githubusercontent.com/14089401/27036267-4f9bdbcc-4f52-11e7-8961-f56fa00e9d14.png

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cortex-lab/KiloSort/issues/68#issuecomment-307790772, or mute the thread https://github.com/notifications/unsubscribe-auth/AEG2VuDpyF2NgbXjvz6dhcJfcjLE9iVSks5sDT51gaJpZM4Ncn22 .

-- Nikolas Karalis - http://www.nikolaskaralis.gr http://nikolaskaralis.gr

Doctoral Candidate in Neuroscience - Ludwig-Maximilians Universität München M.Sc. in Neuroscience (Charité, Berlin & Université de Bordeaux II) B.Sc, M.Sc in Applied Mathematics and Physics (NTUA)

petersenpeter commented 7 years ago

Marius, could you provide more input on ops.initialize = 'fromData'.

marius10p commented 7 years ago

I think it is (briefly) described somewhere in the paper. We take threshold crossings, find a subset of spikes that are most different from each other, and initialize k-means with that. The result of k-means is then used to initialize Kilosort.

petersenpeter commented 7 years ago

Thanks, I was actually referring to the error I had, running with the setting as stated in my previous reply: I tried to change ops.initialize = 'fromData'. yet I received the error below.

Time 769.57, batch 601/656, NTOT 0 Index exceeds matrix dimensions.

Error in fullMPMU (line 205) [~, isort] = sort(st3(:,1), 'ascend');

This error only occur with ops.initialize = 'fromData'.

marius10p commented 7 years ago

Sorry, I missed that. This means no spikes made it over the threshold. Try reducing that threshold only for that first step (the options are in the cfg file in a separate paragraph just for this initialization routine).

petersenpeter commented 7 years ago

Thanks again, while I have your attention, is there anyway to limit templating on large amplitude artifacts? After Kilosort has been run, almost all clusters have large outliers in their feature space. When removing the outliers (manually cutting them out), it is obvious that they are included in the cluster, because the algorithm believes that the unit had spiked at the time of an artifact. Yet, often this is hard to verify from the waveforms, so for each cluster, I remove them manually. Is there any way that this can be minimized?

marius10p commented 7 years ago

I think it depends on what the artifacts look like. You could remove them automatically by using the spike amplitudes perhaps? There shouldn't be many of these artifacts anyway (relative to neural spike rates), and you should probably discard those periods of the recording anyway, so they don't confound your further analysis.

I am also considering whether to replace the continuous penalty on spike amplitude with a penalty that does not allow amplitudes larger than a certain % of the mean. I can let you know when that is implemented.

petersenpeter commented 7 years ago

The penalty percentage sounds great. Often the artifacts arises as the animal bangs its head into something, typically in the homecage, or when he is chewing or something similar, events that, unfortunately, are hard to remove from the recording.

And yes, I have used the amplitude of the spikes together with the amplitude in the feature view to remove them, but so far only manually. They are a small percentage of the spikes in a given cluster, yet they often appear excessively in the auto correlation histogram within the refractory period.

petersenpeter commented 7 years ago

After lowering the threshold (ops.spkTh = 0.25) I still receive the same error:

Time 1193.00, batch 901/986, NTOT 0 Index exceeds matrix dimensions.

Error in fullMPMU (line 205) [~, isort] = sort(st3(:,1), 'ascend');

marius10p commented 7 years ago

I see. Indeed, the current penalty does not deal with artifacts correctly, because it trades off in a continuous manner between explaining the variance of the data and having an amplitude close to the mean amplitude. For artifacts, the penalty from the amplitude will be large, but the penalty for not explaining the variance of the data is even larger, so the artifact is categorized as a spike.

marius10p commented 7 years ago

ops.spkTh should in principle not be larger than -3 (else you will get a lot of noise in the thresholded events). But that doesn't seem to be what's happening, and I cannot tell what's going on. Try running the eMouse simulation to make sure that option works (maybe some built-in Matlab function is screwed up). Also try stopping the processing in preprocessData after line 262, where the first batch of thresholded spikes is found. Make sure there are events in there and it's not just empty. Also check if anything becomes NaN or Inf at some point, maybe in-between the main functions (preprocessData, fitTemplates, fullMPMU).

petersenpeter commented 7 years ago

The problem was the ops.spkTh setting. Lowering it to -6 did the trick.

marius10p commented 7 years ago

Strange... it should have been -6 by default. Let me know how it goes, good luck.