Suggestions On How to Handle Ambigious Spike Sorting

Bilalgp commented 4 years ago

For spike sorting, sometimes there are multiple different clusterings that look appropriate. For example the two figures below. Do you have any recommendations on how you would classify spikes in this situation? I am using the L-ratio metric test to determine if the clustering is valid but it passes in both cases.

ferchaure commented 4 years ago

Personally I always visually check that the std in each sample is more or less constant. Especially when the waveform amplitude has a low absolute value (you could expect more variation generated by an alignment error or the firing rate of the cell in its peak). For example in the second figure look at the grey lines in the cluster 1, around the sample 35, they separate, that doesn't make any sense then in the cluster 1 you have more than one neuron.

Another good practice is to use only one type of threshold, positive or negative. I'm worried that on figure 1 some spikes of the cluster 3 are been detected late or aligned in the negative peak and by its similarity forced into cluster 2.

Bilalgp commented 4 years ago

Thank you for the prompt response. I am using spike sorting on in-vitro data so I was generally finding the best peak analysis with both-sided threshold but I understand the potential issue. I will try one-sided with a lower threshold magnitude.

So for something like the following data set, both a 4 clustering and 2 clustering algorithm gives a good L-ratio and the variation of cluster 1 is mostly at the peak. The gray lines for cluster one is narrow for both 2 and 4 cluster outputs. How would you normally analyze this?

2 cluster ouput:

4 cluster output:

ferchaure commented 4 years ago

I can't say, too much noise in the recording. Can you try in a highest threshold?

Bilalgp commented 4 years ago

Just to clarify, what do you mean by noise? The Vrms is pretty low, < 5 uV. There is a lot of overlapping peaks for sure in some parts of the data trace.

I used to have the double-sided threshold with 5.5 times the median stdev (used in some in-vitro papers). But for the one-sided threshold, I changed it to 4 times the median stdev, since in some cases, there is a significant negative peak that doesn't have a large positive-end and the lower threshold could potentially detect the signal. Are you suggesting I change this magnitude to be higher like 5.5 times?

ferchaure commented 4 years ago

Yes, I would like to see the results of the threshold 5.5 times. In the case of the negative peaks, you can run again the same file with a high negative threshold

Bilalgp commented 4 years ago

This is the result for 5.5 x with positive threshold. My concern with setting a large negative threshold to find the negative peaks is that some are weaker than say cluster 2 or 3 or 4 which has a strong positive side and negative compontent to it as well. I am assuming this will just be a limitation with wave_clus?

ferchaure commented 4 years ago

Another way to use both thresholds is by selecting the well-isolated clusters in each solution and then manually emerging the results.

Remember that is the peak is too small, it can be just noise or far away and no very reliable multiunit activity.

How is the automatic solution? too many clusters?

Maybe setting par.detect_order = 2; will help to reduce the number of spikes that have a very low peak, can you try with this setting?

I am assuming this will just be a limitation with wave_clus?

The bipolar spikes are quite messy for almost all the detection algorithms.

Bilalgp commented 4 years ago

Thank you for the prompt response. My results do vary sometimes when inputting the data with the automatic solution. For the pars.detect_order = 2, the automatic sorting for the data set mentioned above with positive detection criteria has 7-8 clusters output, with a few clusters that fail the L-ratio test (clusters with waveforms < .05 L-ratio value). Is there a guide that describes more of what par.detect_order does?

For my work, I removed the filtering process manually. I don't notice this being an issue. I get a warning when saving the cluster (or an error, but the cluster output saves) and this has been also there in the unmodified wave_clus.

Sampling rate is 50 kHz.

ferchaure commented 4 years ago

with a few clusters that fail the L-ratio test

Ok, what did you do with those clusters? an option could be merging them to the closest template and check again the L-ratio test. Choosing one temperature usually is not the best idea.

I removed the filtering process manually

You could use a low order filter, there is always some noise. However, if you set the par.detect_order = 0 and/or par.sort_order = 0 these filters will be disabled.

Is there a guide that describes more of what par.detect_order does?

Not really. If defines the order of the filters used to:

Detect the spikes (par.detect_order): usually, we use a higher-order filter because we don't want to detect noise
Do the sorting (par.sort_order): usually is lower than thepar.detect_order (we don't want to remove frequencies that could help to describe the waveform).

I get a warning when saving the cluster (or an error

If you want, just tell me the message and we can check what is happening.

Bilalgp commented 4 years ago

I apologize. I should clarify that I already pre-filter the dataset before placing it in Wave_clus for consistency with other tests. I will set par.detect_order and par.sort_order = 0 when using filtered data instead of modifying the wave_clus source code then.

How is the automatic solution? too many clusters?

My previous comment about the automatic solution was there were too many clusters in the automatic solution (in terms of they were not well isolated with the L-ratio test).

Choosing one temperature usually is not the best idea.

I usually do merge manually after the automatic result. I guess my original question was when should I stop merging? From the comment from above (attached below on this comment for clarity) two mergings both passed the L-ratio test. I understand you said the output is pretty noisy. I can try to use the unfiltered data and filter with the par.detect_order = 2 as you said. When I do that, I still get 8-9 clusters (varies run to run with the same data set) that have clusters that fail the L-ratio test. If I begin to merge them, they look similar to that of the previous comment. Data acquisition is 50 kHz.

Unmerged Automatic Results with unfiltered data with the par.detect_order = 2.

These are the warnings and errors I get when pressing the save clusters button to output the clusters to the .mat file. First time it's the warning second time (when it's just segment plotting or making a new times_xx.mat file) it gives the error. Although the file gets generated and there is usually nothing wrong with the output file. This error persists when working with the unmodified wave_clus code from github.

Thank you again for the prompt response.

Thank you for the prompt response. I am using spike sorting on in-vitro data so I was generally finding the best peak analysis with both-sided threshold but I understand the potential issue. I will try one-sided with a lower threshold magnitude.

So for something like the following data set, both a 4 clustering and 2 clustering algorithm gives a good L-ratio and the variation of cluster 1 is mostly at the peak. The gray lines for cluster one is narrow for both 2 and 4 cluster outputs. How would you normally analyze this?

2 cluster ouput:

4 cluster output:

ferchaure commented 4 years ago

About the results: I would totally discard the cluster 1, It doesn't look like a neuron, just noise aligned in a local peak. Even the cluster 4 is a mix of things. I recommend: unforce, delete cluster 4, force again, and delete cluster 1. and then probably make 1 or 2 merges.

About the error/warnings:

You are right, the warning doesn't matter.
The error will be important if you close wave_clus and try to edit the clusters again. Try removing the whitespace on the folder name.

Bilalgp commented 4 years ago

Just so I understand correctly, do you think cluster 1 and part of cluster 4 are noise due to: the peak being low, low ISI values in the histogram with high numbers?

Would this be considered a multiunit or just noise? Is there a source in the literature you would recommend to look into this further? Reading some of the past work with Wave_clus, I see that typically 60 uV is used as a cutoff for a single unit.

Just so I understand your steps right, I performed your steps and got this output (the wave_clus automatic sorting clusters were slightly different): 1, 2, 3 and 5 are likely from separate neurons and cluster 4 is noise (cluster 4 fails the l-ratio test as well (>.05)).

Would there be any way to ensure that cluster 1 and 5 should stay separated and not merged. If I merge them, they look like this (and pass the L-ratio test and do have low ISI below 3 ms). There is a small deviation in the gray lines around 50 ms but it's not that large.

Regarding the error with the whitespace in the folder name: There isn't any whitespace. I tried to put the files in a separate folder and try again but the error still persisted.

ferchaure commented 4 years ago

the peak being low, low ISI values in the histogram with high numbers?

exactly

Would this be considered a multiunit or just noise?

A multiunit just crossing the threshold plus probably a big amount of noise. You can try to use it but remember that probably is just part of a multiunit and part of the "spikes" are just noise

Is there a source in the literature you would recommend to look into this further? Reading some of the past work with Wave_clus, I see that typically 60 uV is used as a cutoff for a single unit.

It depends on the type of electrode, animal etc. Search for literature with a setup similar to yours. That could help you even to distinguish neurons from some types of artefacts.

cluster 4 is noise

It has some spikes from cluster 5 and 3 and probably some mu. You can try: unforce and check the waveform. with a bit of luck that 40 spikes are just a specific subgroup of these clusters and you can merge them and then force again.

Would there be any way to ensure that cluster 1 and 5 should stay separated and not merged.

To be honest, I don't like that merge soo much, but It could be right. Maybe you have some drifting,... If you really want to look at the relationship between time and amplitude of these clusters you can use my little project celestina

About the error Look in the folder where your data is, do you see a data_wc.dg_01.lab file there? For some reason, Matlab can't change the name of that file. Matlab can have issues with some folder names. You can work around this using the batch files: Get_spikes and Do_clustering these do the processing on the Matlabs's terminal and you just use the GUI to check results (it doesn't need to rename files).

Bilalgp commented 4 years ago

I see. Thank you for the information. The file name is e10.mat and I do have data_e10.dg_01.lab in the folder but it was not updated when the error gets displayed. The times_e10.mat generates just fine despite the error.

ferchaure commented 4 years ago

Interesting, maybe the operative system is not allowing Matlab to overwrite the older version of data_e10.dg_01.lab? A solution could remove that file if you are making a new sorting solution from scratch.

The times_e10.mat file will be always fine, just the temperature map will be corrupted if you open that solution again with the GUI.

Bilalgp commented 4 years ago

A multiunit just crossing the threshold plus probably a big amount of noise. You can try to use it but remember that probably is just part of a multiunit and part of the "spikes" are just noise

One more question to confirm I am understanding this correctly, would cluster 1 below possibly fit the criteria of being a multi-unit?

ferchaure commented 4 years ago

Yes, That cluster is a multiunit and a part of them look like just noise that crosses the threshold

ckfaber commented 2 years ago

@Bilalgp I know I'm 2 years late to this discussion, but I'm wondering if you would be open to sharing example code on how you process your cluster results using L-ratio? I'm new to spike sorting (and e-phys in general), and would really appreciate some guidance on how to approach this problem! Thanks very much!

csn-le / wave_clus

Suggestions On How to Handle Ambigious Spike Sorting #171