Can't correspond waveforms to spike times

rsnevers commented 1 month ago

When computing cluster metrics, there is currently no way (that I'm aware of) to match waveforms in each cluster to their spike times. When fetching the waveforms dataframe from the MetricCuration table, a subset of spike waveforms are returned in general. The maximum number of waveforms per cluster to extract and save is set in a parameters table and is generally less than the number spikes belonging to the cluster. Within this dataframe, there is also a column named 'spike_times' that can be used to index into an array of all detected spikes regardless of cluster and get back just the spike times for the current cluster. The problem is that these spike time indices are for all spikes in the cluster regardless of how many waveforms are extracted, so it's not possible to determine which waveforms correspond to which spike times. In the V0 spikesorting pipeline, this array of spike time indices corresponded to just the waveforms extracted, so matching spike times to waveforms was possible. We need this correspondence to compute some of the burst merge metrics because they compare spike amplitudes (a feature computed from waveforms data) for spike times close to each other (a threshold set by comparing spike times).

Please find attached a code chunk that can be used to look at the two pieces of data I'm describing:

from spyglass.spikesorting.v1 import MetricCuration

nwb_file_name = 'BS2820231107_.nwb'
interval_list_name = '02_r1'
sort_group_id = '34'

query = MetricCuration() << f"nwb_file_name = '{nwb_file_name}' AND interval_list_name = '{interval_list_name}' AND sort_group_id = '{sort_group_id}'"
waveforms_df = query.fetch_nwb()[0]['object_id']

This code chunk fetches the waveforms dataframe object from the MetricCuration table, which contains spike time indices for all waveforms in a cluster, but only a subset of the waveforms (in this case no more than 20000). Cluster 2, for example, is problematic.

khl02007 commented 1 month ago

@rsnevers I understand. What do you want as the solution? We could save the indices of the spikes corresponding to the waveforms when extracting fewer than all waveforms (as it is the case here), but it seems like if your goal is to compare waveforms between spikes that occur near each other in time, then this may not help you, as there is no guarantee that the randomly subsampled spikes whose waveforms are extracted will be near each other in time. In you case, you probably want to extract all the waveforms, which you can do by changing the parameters. If you still think that the index is required, we should add that to the analysis NWB file containing the waveforms.

khl02007 commented 1 month ago

We can get the indices of subsampled spikes via waveforms.get_sampled_indices(unit_id) (can be added here) and then modifying _write_metric_curation_to_nwb to save that as a column in the units table of the analysis NWB file.

rsnevers commented 1 month ago

Hi @khl02007, thanks for getting back to me. I think the solution you're suggesting would actually work for me. In the past, we've computed these metrics on subsampled spikes in each cluster and felt we were able to get a representative sample using ~20000 spikes. Do you know if the get_sampled_indices method in SpikeInterface gives back indices of just that cluster's spike times, or are do the indices correspond to the entire unsorted spike train?

khl02007 commented 1 month ago

It returns the indices of the spikes that were sampled for waveform extraction for that particular unit (note that unit_id is an input).

LorenFrankLab / spyglass

Can't correspond waveforms to spike times #1007