Different clustering result in different runs

AHEsmaeili commented 2 years ago

Hi Fernando!

I've reformatted my data based on your previous advice, and am currently trying to use to GUI to run some pilot tests. I have separate files for each electrode (since they are single channel and centimeters apart), with each one containing a data variable, and a sr = 25000.

When I load the data in the GUI, the clustering results are sometimes widely vary from run to run, so I thought to ask if this is within the bounds of the ML alg., or should the results remain somewhat similar in different runs of the same file?

In addition, I've been reading up on the issues section to find which parameters are advised to tune for cleaner clustering results, but I'm still unsure which ones are suitable for my particular dataset (Alpha Omega single channel tungsten electrodes: FHC Instrument, 0.8–1.2 MX impedance).

I would greatly appreciate it if you could inform me on these parameters, or the optimal approach to finding them (based on cluster/waveform characteristics).

Here are two runs of the same file, with default parameters (only par.detect_order = 2).

I've also attached the figures of a run with all default parameters (which seemingly does not vary as much as with par_detectorder = 2).

Data.zip

Many thanks in advance for your insight.

ferchaure commented 2 years ago

When I load the data in the GUI, the clustering results are sometimes widely vary from run to run, so I thought to ask if this is within the bounds of the ML alg., or should the results remain somewhat similar in different runs of the same file?

Is within the bounds of the random components. You can reduce it increasing par.max_spk (the number of spikes used for clustering)

I would greatly appreciate it if you could inform me on these parameters, or the optimal approach to finding them (based on cluster/waveform characteristics).

I only have a few recommendations. Like the clusters shouldn't have alignment issues, like in the cluster 2 in the first figure, trying to find the peak in a part of the spike with a small slope is really noisy. Using a detection as 'both' (negative and positive, can help ) or maybe just negative, I don't know if that cluster 3 is a real neuron, is quite short in time and the number really low. But I don't have experience with that particular acquisition setup.

AHEsmaeili commented 2 years ago

Then the variation is expected. I thought it was due to my acquisition setup.

I also believe that cluster 3 (and maybe cluster 2 in the 2nd figure) is not a real neuron, seems more like a peak in noise. But I could be mistaken, given that this is my first time sorting.

Thanks a lot for the recommendations Fernando! I will try using par.detection as "both" or "negative", alongside varying par.max_spk, and par.detect_order (or maybe filtering before clustering and setting the detect_order and sort_order to zero) to see if I can get better clustering results.

csn-le / wave_clus

Different clustering result in different runs #212