Issue with cluster ids for split and merged clusters

Charles-Elliott commented 4 years ago

Hi!

I have been making various computations on ephys data from kiloSort, such as plotting the mean waveform for a given cluster. To do that, I'm using a template.npy which isn't modified by using phy. The resulting issue is that the clusters that were merged or split in phy have now cluster ids that go beyond what the template file contains, making it impossible to do anything with those clusters.

Is there any way to solve this problem? It's not a phy issue strictly speaking, but it does come from using it; perhaps you have the answer?

PS: I've tested my code on unsorted data and it works fine.

rossant commented 4 years ago

This should really be added to the phy documentation, but you can use the API as follows (untested):

# Get the mean waveforms of a cluster.
from phylib.io.model import load_model

model = load_model(params_path)  # load the TemplateModel instance
spike_waveforms = model.get_cluster_spike_waveforms(cluster_id)
mean_waveforms = np.mean(spike_waveforms, axis=0)

Another approach is to go from cluster indexing to template indexing, as follows (this should also be added to the TemplateModel API, PR welcome...):

spike_ids = model.get_cluster_spikes(cluster_id)  # spikes in the cluster
st = model.spike_templates[spike_ids]  # the template assignments of those spikes
template_ids, counts = np.unique(st, return_counts=True)  # find the most frequent template assignment
ind = np.argmax(counts)
template_id = template_ids[ind]

Once you have the corresponding template, you can use the TemplateModel API. You could also use the different templates from which the cluster comes from to compute weighted average (the weights being the relative number of spikes assigned to each template, for all spikes in the cluster).

Charles-Elliott commented 4 years ago

Thank you for the advice! Although since I'm using matlab for my code (forgot to say that, sorry), I may take some time to test it. I'll post an update when I have.

ClaraBesserer commented 2 years ago

Hello,

I don't know if this is still a relevant issue, but I was in the same situation. Out of your 2 suggestions, the 1st one didn't work (returned an empty array) but the 2nd one did. Here is the code I am using if it can be of any help :

    cluster_ind = np.load( output_folder/'phy'/'spike_clusters.npy')
    cluster_ind=np.unique(cluster_ind)
    from phylib.io.model import load_model
    model = load_model(output_folder/'phy'/'params.py')    

    for rel_unit, abs_unit in enumerate(cluster_ind):

        count = model.get_template_counts(abs_unit)
        best_template = np.argmax(count)
        template_ids=np.nonzero(count)[0]   
        count=count[template_ids]

        template = model.get_template(best_template)
        channel_ids = template.channel_ids
        templates = [model.get_template(template_id) for template_id in template_ids]             

        for i , b in enumerate(templates):
            data[i][:, b.channel_ids] = b.template

        waveforms = data[..., channel_ids]
        assert waveforms.shape == (len(template_ids), ns, len(channel_ids))
        mean_waveforms = np.average(waveforms, axis=0, weights=count)
        assert mean_waveforms.shape == (ns, len(channel_ids))

neurojak commented 1 year ago

Just came across this issue myself and I agree it would be a good idea to add this to the documentation. FWIW, as a newbie I assumed that Phy would update the templates for downstream analyses after split/merge

cortex-lab / phy

Issue with cluster ids for split and merged clusters #976