PC-based quality metrics on data from Kilosort 1 fail

plodocus commented 4 years ago

Hello,

I'm trying to run the quality metrics module on data that was sorted with Kilosort 1 (about 2 years ago if that matters). Unfortunately, computation stops at PC-based metrics:

(ecephys_spike_sorting) bash-3.2$ python -m ecephys_spike_sorting.modules.quality_metrics --input_json params.json --output_json output.json

ecephys spike sorting: quality metrics module
Loading data...
Calculating isi violations
 ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 100% 
Calculating presence ratio
 ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 100% 
Calculating firing rate
 ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 100% 
Calculating amplitude cutoff
 ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 100% 
Calculating PC-based metrics
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/agjacob06/Documents/Git/ecephys_spike_sorting/ecephys_spike_sorting/modules/quality_metrics/__main__.py", line 77, in <module>
    main()
  File "/Users/agjacob06/Documents/Git/ecephys_spike_sorting/ecephys_spike_sorting/modules/quality_metrics/__main__.py", line 67, in main
    output = calculate_quality_metrics(mod.args)
  File "/Users/agjacob06/Documents/Git/ecephys_spike_sorting/ecephys_spike_sorting/modules/quality_metrics/__main__.py", line 30, in calculate_quality_metrics
    metrics = calculate_metrics(spike_times, spike_clusters, amplitudes, channel_map, pc_features, pc_feature_ind, args['quality_metrics_params'])
  File "/Users/agjacob06/Documents/Git/ecephys_spike_sorting/ecephys_spike_sorting/modules/quality_metrics/metrics.py", line 81, in calculate_metrics
    params['n_neighbors'])
  File "/Users/agjacob06/Documents/Git/ecephys_spike_sorting/ecephys_spike_sorting/modules/quality_metrics/metrics.py", line 228, in calculate_pc_metrics
    peak_channels[cluster_id] = pc_feature_ind[cluster_id, pc_max]
IndexError: index 513 is out of bounds for axis 0 with size 512

cluster_ids has non-consecutive numbers which seems to be the cause of the problem. I'm a bit confused that enumerate is used as the generator but the variable idx is not used as the index. Could cluster_id be switched with idx?

jsiegle commented 4 years ago

I believe this discrepancy results from the fact that KS1 doesn't do any merging of units, so the cluster IDs are always consecutive. However, after manual merging/splitting in phy, you'll get new cluster IDs that are larger than the first dimension of the pc_feature_ind matrix, hence the indexing error.

pc_feature_ind.npy and templates.npy are actually indexed by spike_templates.npy. After manual refinement there may not be a 1:1 relationship between the cluster ID and the template ID, which is what this code assumes.

Is this something you'd like to work on fixing? Either way, I will take a look at it when I'm back in lab next week.

plodocus commented 4 years ago

Yes, I can try to fix it.

plodocus commented 4 years ago

Well, I got it to run without an error but all I get are nans. I think I don't entirely understand what exactly happens in the code and what some of KS's output represents, so it's best if someone else works on this.

One potential difficulty if the indexing is changed to use the spike_templates is that there might be multiple templates for merged clusters and shared templates for split clusters.

plodocus commented 4 years ago

@jsiegle Did you have a chance to take a look at this?

jsiegle commented 4 years ago

I made a little bit of progress...I think the code in the ks1-compatibility branch is almost working, except for the PC feature extraction step. The functions that need to be updated are on lines 280-285 of metrics.py:

channel_mask = make_channel_mask(cluster_id2, pc_feature_ind, channels_to_use)

subsample = int(relative_counts[idx2])
index_mask = make_index_mask(spike_clusters, cluster_id2, min_num = 0, max_num = subsample)

pcs = get_unit_pcs(pc_features, index_mask, channel_mask)

The tricky part is that the relevant rows in the PC matrix depend on the spike_template ID for each spike, and so you can't simply use the intersection of the channel_mask and index_mask. I think there needs to be a for loop that checks the template IDs before pulling out the PCs. I'll try to write this next week, unless you have a chance to look at it before then.

jsiegle commented 4 years ago

Hi @DanBenHa, I just pushed some code to the ks1-compatibility branch that should fix the problem in principle, but which isn't working with the particular KS1-sorted data that I have. For some reason, the values in the pc_feature_ind matrix jump all over the place, which makes it almost impossible to find consecutive matching channels across templates.

Can you test what I've written on your datasets? If it doesn't work, you can send me the post-curation Kilosort outputs for one session, and I can test it here.

plodocus commented 4 years ago

Hi @jsiegle,

apologies for the delayed response. I tried the code in the ks1-compatibility branch, but encountered this error:

ecephys spike sorting: quality metrics module
Loading data...
Calculating PC-based metrics
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/agjacob06/Documents/Git/ecephys_spike_sorting/ecephys_spike_sorting/modules/quality_metrics/__main__.py", line 77, in <module>
    main()
  File "/Users/agjacob06/Documents/Git/ecephys_spike_sorting/ecephys_spike_sorting/modules/quality_metrics/__main__.py", line 67, in main
    output = calculate_quality_metrics(mod.args)
  File "/Users/agjacob06/Documents/Git/ecephys_spike_sorting/ecephys_spike_sorting/modules/quality_metrics/__main__.py", line 30, in calculate_quality_metrics
    metrics = calculate_metrics(spike_times, spike_clusters, spike_templates, amplitudes, channel_map, pc_features, pc_feature_ind, args['quality_metrics_params'])
  File "/Users/agjacob06/Documents/Git/ecephys_spike_sorting/ecephys_spike_sorting/modules/quality_metrics/metrics.py", line 75, in calculate_metrics
    params['n_neighbors'])
  File "/Users/agjacob06/Documents/Git/ecephys_spike_sorting/ecephys_spike_sorting/modules/quality_metrics/metrics.py", line 243, in calculate_pc_metrics
    cluster_peak_channels[idx] = np.median(template_peak_channels[templates_for_unit])
IndexError: index 488 is out of bounds for axis 0 with size 487

Could you provide me with an email address to which I could send a link to the Kilosort output?

P.S.: I realised that this toolbox was written for Neuropixels probes. My data was acquired with Neuronexus probes. Is that a problem for the quality metrics code?

jsiegle commented 4 years ago

It should work for Neuronexus, but the code may contain some Neuropixels-specific assumptions that are causing problems. If you can send me the data, it will be very helpful for debugging. My email address is joshs [at] alleninstitute [dot] org

jsiegle commented 4 years ago

The latest changes I merged into the master branch should fix this issue. Please let me know if you're still having problems!

AllenInstitute / ecephys_spike_sorting

PC-based quality metrics on data from Kilosort 1 fail #45