AllenInstitute / ecephys_spike_sorting

Modules for processing extracellular electrophysiology data from Neuropixels probes
Other
109 stars 91 forks source link

PCmetrics not being calculated for some clusters #74

Open JRicardo24 opened 2 years ago

JRicardo24 commented 2 years ago

Hi guys, I was checking out the function on the metrics module that calculates the Principal Component related metrics, from which the next printscreen was taken: Github_askJosh The condition in line 286 seems to be working just fine, as I tested on my own dataset and it doesn't calculate the metrics for clusters with 20 spikes or less (also tested for 100 spikes threshold). However, I do have some other clusters in my dataset with more than 20 spikes, including one with 30k ish spikes, for which metrics are not being calculated (its row on the DataFrame is filled with NaN values). I was wondering if this has to do with the conditions on lines 284 or 285, since I haven't quite fully understood what they accomplish. Any help is appreciated :) @jsiegle

jsiegle commented 2 years ago

Just to clarify – have you done any manual curation on this dataset? If not, then do you know which of the four conditions (all_pcs.shape[0] > 10, not (all_labels == cluster_id).all(), etc.) is causing it to skip the calculation for the units with lots of spikes?

JRicardo24 commented 2 years ago

Sorry for the delay @jsiegle . No manual curation has been done on the dataset. After some tests I found out that the condition that is causing it to skip the calculation for units with lots of spikes, like the cluster we have with 30917 spikes, is the (sum(all_labels == cluster_id) > 100). That is, in this case it is cluster 240 but when I print it's (sum(all_labels == cluster_id)) the result is 0. I trully don't know why the all_labels for this cluster does not contain 240 in it's elements, but that is what is preventing the calculation of the PC_metrics.

jsiegle commented 2 years ago

Can you print the values in relative_counts for this cluster? It's possible that the count scaling is causing there to be zero PCs included in the calculation.

JRicardo24 commented 2 years ago

Git_resposta This is the relative_counts for cluster 240. The value printed below, 41, it's just the length. This test was made with a max_spikes_for_unit value of 2000. Here's more info that might be helpful, @jsiegle : aditional_info

jsiegle commented 2 years ago

I'm not sure what could be causing this. Let me know if you're able to gain any more insight into the problem.