Waveforms - Githubissues

micheleacox commented 8 years ago

I have a few questions about plotting waveforms in MATLAB after the data is sorted. I am pasting Figure 2 from Pachitariu M. et al., 2016, for reference.

(1) Does KiloSort output waveforms that correspond to each spike? If so, which .npy file and/or rez field contains these? If not, is it best to use the timestamps and retrieve the data directly from the source data file? (2) Can the PC and template outputs of KiloSort be used to create the red traces in the figure below? If so, how?

untitled Pachitariu M, Steinmetz NA, Kadir S, Carandini M and Harris KD (2016). Kilosort: realtime spike-sorting for extracellular electrophysiology with hundreds of channels. bioRxiv dx.doi.org/10.1101/061481

nsteinme commented 8 years ago

Hi Michele,

Yes you'll have to read the waveforms out of the raw file directly using the spike times. Here's a code snippet in matlab that should work:

mmf = memmapfile(params.filename, 'Format', {params.dataType, [nChInFile nSamp], 'x'});

st = readNPY('spike_times.npy'); % these are in samples, not seconds clu = readNPY('spike_clusters.npy'); theseST = st(clu==19); % spike times for cluster 19 extractST = theseST(1:min(100,length(theseST))); %extract at most the first 100 spikes nWFsToLoad = length(extractST); nCh = 64; % number of channels wfWin = [-30:30]; % samples around the spike times to load nWFsamps = length(wfWin); theseWF = zeros(nWFsToLoad, nCh, nWFsamps); for i=1:nWFsToLoad tempWF = mmf.Data.x(1:nChInFile,extractST(i)+wfWin(1):extractST(i)+wfWin(end)); theseWF(i,:,:) = tempWF(params.chanMap+1,:); end

For #2, the red waveforms should be the contents of templates.npy. See here for some more documentation about the npy files produced: https://github.com/kwikteam/phy-contrib/blob/master/docs/template-gui.md

On Thu, Oct 27, 2016 at 8:22 PM, Michele A Cox notifications@github.com wrote:

I have a few questions about plotting waveforms in MATLAB after the data is sorted. I am pasting Figure 2 from Pachitariu M. et al., 2016, for reference.

(1) Does KiloSort output waveforms that correspond to each spike? If so, which .npy file and/or rez field contains these? If not, is it best to use the timestamps and retrieve the data directly from the source data file? (2) Can the PC and template outputs of KiloSort be used to create the red traces in the figure below? If so, how?

[image: untitled] https://cloud.githubusercontent.com/assets/7706858/19781667/eb17c9d2-9c4f-11e6-9b33-2a1e1190dcd0.png Pachitariu M, Steinmetz NA, Kadir S, Carandini M and Harris KD (2016). Kilosort: realtime spike-sorting for extracellular electrophysiology with hundreds of channels. bioRxiv dx.doi.org/10.1101/061481

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cortex-lab/KiloSort/issues/35, or mute the thread https://github.com/notifications/unsubscribe-auth/AHPUP--pNMWrCdcZ9xotkVTEFQZkt-UQks5q4PnbgaJpZM4Kiufq .

linussun commented 7 years ago

Thanks Nicholas & Kilosort team for this wonderful tool! I am trying to reconstruct waveforms as Michelea is doing above on real world 32 channel array data & the eMouse example in Kilosort. I've been through the documentation and the code you provided above. I have a few questions about this:

template_features_ind.phy in my Kilosort files is a 16x64 matrix. If I understand correctly each column of 16 is associated with each template that is used to extract a spike. In the documentation, it states that these 16 values represent "other Features" and in my data set, the values range from 1-64. Where are these 'other features' defined/specified and what are they exactly?
The templates.npy data are a generic template waveform that captures a spike and it has very low values. The real world waveform raw data from the mmf variable has a much larger values. Is there a matrix/npy file that stores the relative gain to match the template with the actual spike like it is matched in the phy GUI? In addition when using phy to review the Kilosort data, clusters with large waveforms have a single highlighted blue waveform coming from one channel while other clusters (esp those with low amplitude wave forms across many channels) appear to have more than one channel highlighted in blue, what does this represent, and is this stored in any of the Kilosort files?
What is the best way to recreate Figure 2, where red traces overlap the raw waveform? What is the best way to recreate the average waveform as seen in the waveform view in Phy?

Thanks. LDS

marius10p commented 7 years ago

Hi,

The other features are the "template features", described in figure 4 of the paper. These are simply projections of each spike onto all templates. The feature values are actually in template_features.phy, which in your case would be 16 by the number of spikes. The file you pointed out contains indexing information, which is used to sparsify the encoding of these features: for a spike assigned to template N, we only want to compute the projections onto the most similar 16 other templates. For large datasets, this sparsification is crucial for saving disk space.
templates.npy have unit norm across channels and timepoints. The amplitudes.npy scale these templates. Further rescaling is necessary to unwhiten the templates, by multiplying with the inverse of whitening matrix, available in whitening_mat_inv.npy. Both of these operations have already been applied in the matlab results variable: look for rez.Wraw.

The highlights are a different issue and are computed in Phy. I believe for each template, only the channels with a large enough magnitude (relative to the max channel) are colored in blue.

The waveform view in Phy displays the templates, not the average waveform. templates.npy also contains the rank-3 templates, not the average waveform. If you want the true average waveforms, after processing (and while variables are still in your workspace), you could run the function gather_mean_spikes. We should add more documentation to this function, but it should just work.