possible clustering issue

aboharbf commented 5 years ago

I am under the impression a single line changing in size represents the same cluster, and most of the lines on a particular cluster should end up X's - the red line above has 3 clusters which end up used - is this a sign of the same spikes being in multiple clusters?

ferchaure commented 5 years ago

I am under the impression a single line changing in size represents the same cluster,

Not really, that is only a way to represent the dynamic of the clustering results. The blue line represent the size of the largest cluster, the red one the size of the second largest, etc.

For example in a case like the one you show as the system goes to a higher temperature:

The red cluster are spike from the blue one that separate from the main cluster (you can't see the change in the blue line because of the log scale)
That cluster decrease in the next temperature (and probably disappear later).
But you can't see that clearly because a new large group of spikes are discriminated in the few next temperatures (the green dot)

Did you understand the idea?

aboharbf commented 5 years ago

I do better now - You're saying that the red line will always represent the 2nd largest cluster at a particular temperature, and a dip and rise suggests a cluster dissolved and a new one took over as the 2nd largest.

An additional question - I'd like to dissolve clusters based on proximity to the threshold, but it seems threshold values aren't saved anywhere - How would you recommend going about this? I am between doing it after clustering results are read into my pipeline, so outside of a wave_clus function, or finding where the continuous signal is processed and a threshold is set, and then attempting to append this to the parameters somehow - I don't know how successful that will be since I notice there are many parameter checks and I fear such a variable will be cleared before it is saved.

aboharbf commented 5 years ago

So I figured this out - Just saved the variable in the relevant function of get_spikes and loaded the variable before passing it to do_clustering. On a few cases I'm getting variable clustering results which either drive the cluster below the threshold, and lead to it being moved back into cluster 0, or two clusters emerge, one lower one higher, which get unsorted and remain, respectively. Do you think it may be wise (given my desire to make spikes up to a certain threshold MUA) to somehow modify do_clustering to exclude these spikes? Or is there a better way to go about it?

ferchaure commented 5 years ago

Hi, sorry for the late response. Your solution for the first question is right

On a few cases I'm getting variable clustering results which either drive the cluster below the threshold, and lead to it being moved back into cluster 0, or two clusters emerge, one lower one higher, which get unsorted and remain, respectively.

I undestood that you make the clustering with all the spikes and sometimes the spikes under the threshold stay in the cluster 0 and sometimes form two clusters? Is that right?

To be clear: cluster 0 is all the spikes too far away from the mean waveform of the clusters.

Sometimes the amplitude of a cluster is not amazing but the waveform is different enough for discriminate that neuron from the SUA

Do you think it may be wise (given my desire to make spikes up to a certain threshold MUA) to somehow modify do_clustering to exclude these spikes? Or is there a better way to go about it?

Is always better to give all the spikes to the clustering, defining SUA only from the amplitude it doesn't sound right. I haven't find a good way to discriminate MUA from single units, is not only the amplitude. Probably you could use a test for unimodality for the waveforms.

Notice that, usually wave_clus put the SUA in the cluster 1 (because they are a big par of the spikes detected).

aboharbf commented 5 years ago

This picture may help Once wave clus is completely finished with the results, I go back, use the threshold i mentioned earlier and see if it is within some fraction of that threshold (1.2 in my current case). So I'm trying to say "I want spikes within 4 SD's, but I want only units which are 5 SD's or more away from baseline". In the picture, the red is the threshold, the blue is the threshold for non-cluster-0 spikes, which means that blue unit is being resorted to cluster 0.

You're right that it is an ugly way to exclude units, I definitely hope to improve it. The unimodality metric seems like it would be a good start. And you're right, typically the most distinct units do end up in Cluster 1.

ferchaure commented 5 years ago

So I'm trying to say "I want spikes within 4 SD's, but I want only units which are 5 SD's or more away from baseline". In the picture, the red is the threshold, the blue is the threshold for non-cluster-0 spikes, which means that blue unit is being resorted to cluster 0.

Perfect, I understand now. I had tried simple approaches like that without any luck. Let me now if you find a better way.

And you're right, typically the most distinct units do end up in Cluster 1.

I have to say that the comment is more valid if you compare the results against the old wave_clus where the MUA and not well-discriminated SUA went to cluster 1. Just to be clear about the useless of the current cluster numbers.

csn-le / wave_clus

possible clustering issue #116