Julie-Fabre / bombcell

Automated quality control, curation and neuron classification of spike-sorted electrophysiology data
GNU General Public License v3.0
124 stars 30 forks source link

Question with bc_getDistanceMetrics #91

Closed chrisan97 closed 5 months ago

chrisan97 commented 5 months ago

Hi! Thanks for this great package that you have written.

I just had a question about bc_getDistanceMetrics.

Near the beginning of the code (line 48) you check for

currentID = uniqueIDs(iID);

 % Skip if current ID matches the unit of interest
    if currentID == thisUnit
        continue;
    end

where 'thisUnit' is the ID of the unit (cluster number) and currentID is the first column from the pc_features.npy input. I'm kind of confused what you are comparing here. Isn't the first column from the pc_features.npy a channel number? So you would be comparing channel number with the ID of the cluster here? Maybe I am misunderstanding something here. Any help with the interpretation would be great. Thanks a lot!

Julie-Fabre commented 5 months ago

Hi Chrisan,

Bombcell uses pc_feature_ind to define the uniqueIDs and currentID. (see line 39: uniqueIDs = unique(pc_feature_ind(:, 1));). pc_feature_ind stores the kilosort output file pc_feature_ind.npy, and is in the format [nTemplates × nPCFeatures]. I think you might have gotten confused with pc_feature that stores the kilosort output file pc_feature.npy and is in the format [nSpikes × nFeaturesPerChannel × nPCFeatures].

chrisan97 commented 5 months ago

Hi Julie,

Thanks for the response. I think my confusion is coming from the description of these .npy files on the phy website.

It says here: pc_features.npy - [nSpikes, nFeaturesPerChannel, nPCFeatures] single matrix giving the PC values for each spike. The channels that those features came from are specified in pc_features_ind.npy. E.g. the value at pc_features[123, 1, 5] is the projection of the 123rd spike onto the 1st PC on the channel given by pc_feature_ind[5].

Since pc_feature.npy is [nSpikes x nFeaturesPerChannel x nPCFeatures] (in my case this is nSpikes x 3 x 32) and pc_feature_ind.npy is [nTemplates x nPCFeatures] (in my case this is nTemplates x 32), I thought each row of pc_feature_ind corresponds to each cluster in phy in which the columns represents the channel locations where the nPCFeatures in pc_feature.npy is from. Is this interpretation correct?

If the above is true, I thought you would want to skip over the matching channels by doing this, but then in which currentID is the unit cluster # In my understanding uniqueIDs is equivalent to the peak channel of the unit and so here we would be checking the unit cluster # with the channel position... which didn't make sense to me.

currentID = uniqueIDs(iID);

    % Skip if current ID matches the unit of interest
    if currentID == thisUnit
        continue;
    end

I think here you want to take out any rows from pc_feature_ind that holds the channel positions for thisUnit right?

Please let me know if my explanation doesn't make sense, I can try to provide some specific examples as to why I think this way. Thanks!!

Chris

Julie-Fabre commented 5 months ago

Oh yes, sorry! You're absolutely right, I got confused there myself. Indeed, that part of the code was wrong (!!) - it probably never got spotted because it isn't used or encouraged. It should be fixed now. Thank you very much for bringing it up.

btw - if you set param.plotDetails to 1 (do this only for one or a couple units a time in the for loop, otherwise you will get way too many plots for matlab to handle) you can take a look at plots of the metrics for each unit, including the distance metrics.

chrisan97 commented 5 months ago

Cool :) glad the issue is fixed now. Thanks for the tip about the plotting!

Chris