csn-le / wave_clus

A fast and unsupervised algorithm for spike detection and sorting using wavelets and super-paramagnetic clustering
123 stars 65 forks source link

'Force membership' tools and parameters #74

Closed perczelgy closed 6 years ago

perczelgy commented 6 years ago

Dear Fernando!

I am wondering if I could improve my results by achieving small, but 'pretty nice and clear' cluster and then using the 'Force'. :) For that reason I tried to understand the importance of the available parameters in the 'force membership parameters' section. As far as I understood,

Did I get the ideas correct? Are the preset parameters the 'best ones' (though I suppose there isn't a best one anyway) or is it possible to fine-tune them for a better performance (e.g. if we can assume that our clusters are very nice but not too large)? Do you have any ideas when to use which type of parameter-setting?

Thank you very much in advance, Gyurka

ferchaure commented 6 years ago

The dots are perfect, I couldn't explain them better. Do you want to write new docs for wave_Clus?

Yes, the whole spike shapes are the datapoints of the waveform you can see in the GUI. Sometimes is better use the spike shapes because you can have a noise or overlapping in the wavelets you choose, but the distance to the template is small.

The parameters aren't the best ones for all the cases. For your case, check par.max_spk is bigger that your total amount of spikes, to be sure that you are using all the spikes in your small classes for the clustering. And I suggest use the 'center' option but with a smaller stdnum (~1.5).

perczelgy commented 6 years ago

Dear Fernando!

Thanks for your kind and quick reply. I complemented my notes so that I hope they are better now (I also found some mistakes that I've had to correct), though I'm afraid that there still might be some errors in it. Furthermore, I added my questions to each section.

FORCE MEMBERSHIP PARAMETERS:

I am not sure if I could write a whole new docs for wave_clus, but if you can make use of these notes in a new documentation you are free to do so, I am happy if I could help you.

Thank you very much, György

ferchaure commented 6 years ago

Hi György sorry for the late answer, we are doing a lot of new improvements. I like the explanations, I will add a few comments.

par.template_type: type of clustering algorithm used to distribute unsorted spikes into the predefined clusters when hitting the 'Force' button

This method will be use automatically when you use the batch file, Do_clustering

i) Did I get the limiting distance right? Is it true then that very small 'par.template_sdnum' should be used to make use of this parameter, as if 'par.template_sdnum'=~2-3, then practically all clustered spikes are included?

Yes that's right. The value in par.template_sdnum depends on your data, sometimes you have a noisy signal but well separated clusters (or very small clusters in the clustering solution), in that cases you want a bigger value of template_sdnum

ii) Did I get the SD part right? Wouldn't it be better not to use a spherical distance metric but an ellipsoid-like one or do some cluster- and dimension-wise normalization of the features~ variances?

You got it right. That distance try to compensate that the clustering solution could have low noise in some segment of the waveform , using the idea that all the samples has the same SD. I know that this idea doesn't work for wavelets coefficients, but in general, ellipsoid-like distance doesn't work for small clusters and normalization could have problems if you have a lot of artefacts or big multi units.

iii) Wouldn't it be reasonable to use some kind of lower threshold not to sort 'ugly spikes' into the clusters?

Probably and it has to be a minimum probability. But again, this methods use a lot of info from the SPC solutions and usually they are small clusters, then the models made from them are quite noise. For that reason we don't use this one.

iv) Did I get it right?

Yes

Thank György, I can make a doc using this issue, to be honest the force step needs a review, and probably a few improvements. A time ago we added the forced variable to the times file in part because we wanted to give the possibility to the users to make they own force methods.

perczelgy commented 6 years ago

Hi Fernando!

Thank you very much for the detailed explanations!

ferchaure commented 6 years ago

Hi György I will add these explanations in this wiki page Parameters.