Cluster quality and waveform (basic)

SpikeInterface / spikeinterface

A Python-based module for creating flexible and robust spike sorting pipelines.

https://spikeinterface.readthedocs.io

MIT License

486 stars 187 forks source link

Cluster quality and waveform (basic) #2257

Closed kshtjkumar closed 2 months ago

kshtjkumar commented 9 months ago

Hi, I have a question regarding the quality of my sorting, while i run this code (i use tetrode, and mountainsort5 for sorting):

fig, ax = plt.subplots()
num_samples = we.nbefore + we.nafter
timestamps_ms = np.arange(num_samples) / we.sampling_frequency * 1000
for i, unit_id in enumerate(sorting_KS.unit_ids[:1]):
    template = we.get_template(unit_id)
    color = colors[i]

    ax.plot(timestamps_ms,wf[:, :, 0].T, color=color, lw=0.3, alpha = 0.2)
    ax.plot(timestamps_ms,template[:, 0].T, color='red', lw=3) 
plt.xlabel("Time (mS)")
plt.ylabel("Amplitude (micro Volt)")
plt.title("Avg_Waveform")

is my understanding correct here: The below line give me all the waveforms of the spikes in the 1st cluster coming from channel number 0: ax.plot(timestamps_ms,wf[:, :, 0].T, color=color, lw=0.3, alpha = 0.2) and this line gives the average of the clusters of the channel 0: ax.plot(timestamps_ms,template[:, 0].T, color='red', lw=3)

even tho while recording we see good spiking activity after sorting the wave forms look deformed and skewed.

Upon evaluating the quality matrix the ISI violation of the cluster comes to be 0 or less than 0.5. How to interpret this? If waveform is good then the isi ration is bad and vice versa, in most of my cases. How to be sure that the quality of my recording is good?

zm711 commented 9 months ago

@kshtjkumar,

instead of trying to pull things out yourself I would suggest starting with some of the internal widgets, for example shows how to pull out waveforms For waveforms:

unit_ids = sorting.unit_ids[:4]

sw.plot_unit_waveforms(we, unit_ids=unit_ids)

for templates

unit_ids = sorting.unit_ids

sw.plot_unit_templates(we, unit_ids=unit_ids, ncols=5)

I'm not sure where you are getting wf from in your example above? Are you reading the raw file yourself or using an SI based function. Is your wf filtered or not? I'm wondering if you could try the same plotting using the internal SI tools I've listed and then post those so we can see what is happening there. Otherwise, it would be beneficial to have your whole script so we know where each variable is coming from.

To summarize:

1) Could you try with the widgets module to redo your plotting 2) Could you post your whole script so we can see where things are coming from

kshtjkumar commented 9 months ago

hi, tried the above widgets code!

kshtjkumar commented 9 months ago

@zm711

here is my full code:

import spikeinterface.full as si  # import core only
import spikeinterface.extractors as se
import spikeinterface.preprocessing as spre
import spikeinterface.sorters as ss
import spikeinterface.postprocessing as spost
import spikeinterface.qualitymetrics as sqm
import spikeinterface.comparison as sc
import spikeinterface.exporters as sexp
import spikeinterface.widgets as sw
import pyintan
import spikeinterface as si
import spikeinterface.extractors as se
import spikemetrics as st
import matplotlib.pyplot as plt
import numpy as np
import warnings
warnings.simplefilter('ignore') 
%matplotlib widget

file = "17_feb_lko_230217_155836_full_merged.rhs"
reader = se.read_intan(file,stream_name ='RHD2000 amplifier channel',use_names_as_ids=True)
recording_rhs = reader
print(reader.channel_ids)
recording_rhs.annotate(is_filtered = False)
channel_ids = recording_rhs.get_channel_ids()
fs = recording_rhs.get_sampling_frequency()
num_chan = recording_rhs.get_num_channels()
num_segments = recording_rhs.get_num_segments()
recording_rhs_1 = recording_rhs

print("Channel_ids = ", channel_ids)
print("Sampling_frequency = ", fs)
print("Number of Channels = ", num_chan)
print("Number of segments = ", num_segments)
print('Total_rec_duration = ', recording_rhs.get_total_duration())
ecog = ['B-006', 'B-007', 'B-008' ,'B-009']
recording_rhs = recording_rhs.channel_slice(ecog)

import probeinterface as pi
number_of_channel = 4
posn = np.zeros((number_of_channel,2))
radius = 50
for i in range(number_of_channel):
    theta = i/number_of_channel*2*np.pi
    x = np.cos(theta)* radius
    y = np.sin(theta)* radius
    posn[i] = [x,y]
    print(i,theta,x,y)
probe = pi.Probe(ndim = 2)
probe.set_contacts(posn, shapes = 'square', shape_params={'width' : 20 , 'height' :20 })
probe.set_contact_ids([f"Ch{i}" for i in range (number_of_channel)])
probe.create_auto_shape()
pi.plotting.plot_probe(probe, with_contact_id=True)
probe.to_dataframe(complete=True)
probe.set_device_channel_indices(np.random.permutation(number_of_channel))
probe.to_dataframe(complete=True)
recording_rhs.set_probe(probe, in_place = True)
probe_loaded = recording_rhs.get_probe()
probe_loaded.to_dataframe(complete=True)
recording_rhs.get_channel_locations()
#sw.plot_probe_map(recording_rhs, with_channel_ids = True)
recording_rhs.get_property('contact vector')

recording_ecog = spre.bandpass_filter(recording_rhs, freq_min=300, freq_max = 6000 ) #bandpass filter
recording_notch_ecog = spre.notch_filter(recording_ecog,q = 50)  #notch_filter 
rec_ecog = spre.common_reference(recording_notch_ecog, operator="median", reference="global")  #rereferencing the data 
sorting_KS = ss.run_mountainsort5(rec_ecog)
print("KS found", len(sorting_KS.get_unit_ids()), "units")
sorting_KS = sorting_KS.remove_empty_units()
print("KS found", len(sorting_KS.get_unit_ids()), "non empty units")

rec_ecog.annotate(is_filtered=True)
we = si.extract_waveforms(rec_ecog, sorting_KS, folder="waveforms_kshtj094416596", load_if_exists=False, overwrite=True )
print(we)
spost.compute_spike_amplitudes(we)

for unit in sorting_KS.get_unit_ids():
    waveforms = we.get_waveforms(unit_id= unit)
    spiketrain = sorting_KS.get_unit_spike_train(unit)
    print('Unit', unit, "num_waveforms:" ,(waveforms.shape[2]), "num spikes:", (len(spiketrain)))

num_samples = we.nbefore + we.nafter
timestamps_ms = np.arange(num_samples) / we.sampling_frequency * 1000

unit_ids = sorting_KS.unit_ids
for unit_id in unit_ids[:]:
    fig, ax = plt.subplots()
    template = we.get_template(unit_id=unit_id, mode='average')
    print(template.shape)
    ax.plot(timestamps_ms,template)
    ax.set_title(f'{unit_id}')

plt.show()
plt.xlabel("Time (ms)")
plt.ylabel("Amplitude (micro volt)")

colors = ['green', 'red', 'blue','yellow','Olive', 'Teal', 'Fuchsia','black','orange', 'purple',"brown"]
fig, ax = plt.subplots()
for i, unit_id in enumerate(sorting_KS.unit_ids[:]):
    wf = we.get_waveforms(unit_id)
    color = colors[i]
    ax.plot(timestamps_ms,wf[:, :, 0].T, color=color, lw=0.4)
    #plt.legend(["1","2","3"])

plt.xlabel("Time (mS)")
plt.ylabel("Amplitude (micro Volt)")
plt.title("Waveform")

qm = sqm.compute_quality_metrics(we)#, sparsity=sparsity_radius[0], verbose=True)
display(qm)

pc= spost.compute_principal_components(we, n_components= 3)
colors = ['Olive', 'Teal', 'Fuchsia','black', "blue",'red','green','yellow','orange','magenta']
fig, ax = plt.subplots()
for i, unit_id in enumerate(sorting_KS.unit_ids[:]):
    comp = pc.get_projections(unit_id)
    print(comp.shape)
    color = colors[i]
   ax.scatter(comp[:, 0, 3], comp[:, 1, 3], color=color)
    plt.legend(['1', '2','3','4'])
    plt.title("PCA Clusters")

kshtjkumar commented 9 months ago

wf is taken from this page: https://spikeinterface.readthedocs.io/en/0.92.0/modules/toolkit/plot_2_postprocessing.html

zm711 commented 9 months ago

@kshtjkumar

Thanks for all the code (I have some more questions/comments). Why are you doing:

probe.set_device_channel_indices(np.random.permutation(number_of_channel))

Creating the probe correctly is super important. The device_channel_indices are the values for your headstage matrix. They need to be directly related to the matrix of your raw data. See the probeinterface docs. Note that in those docs the wiring is given as random values, but your actual recording needs the correct wiring to work appropriately (also see this issue). Are you using a home-made probe? The probeinterface library has a variety of Cambridge Neurotech and Neuronexus probes along with some automatic wiring paths to prevent probe errors (docs here). Often with tetrodes it is better to sort based on by_group (some examples here).

Does all that make sense? I think we need to start by figuring out what is happening at the level of making your probe before the sorting even happens to make sure the probe is correct. Once the probe is correct we can check the rest of your code. Could you just walk me through your probe generation code. Thanks!

kshtjkumar commented 9 months ago

thank you for the resources @zm711 yes we are making our home made probes, so based on probe diameters I built the custom probe. I used the probe code form a tutorial, maybe you are right, but what should be the replacement of this ?

My headstage matrix gives me channel id like this: Channel_ids = ['B-000' 'B-006' 'B-007' 'B-008' 'B-009' 'B-011' 'B-029' 'B-030' 'B-031']

zm711 commented 9 months ago

This is a common point of confusion. (it happens to all of us). It is probably better in this case for you to start by putting the probe onto the recording and then slicing (that way spikeinterface/probeinterface keep track of everything). So with your home-made probe you need to figure out how it is wired up to the Intan headstage chip (ie are you using omnetics or wiring yourself etc). The headstage matrix are the device name so for example 'B-000' would likely be the B port matrix channel 0. But you need to figure how that value is attached to your probe. With a homemade probe I have no way to know how that works for your set-up, but you just need to figure out what a contact on your probe is equal to in your final data matrix. In the issue I linked above there there is a verbal explanation (here- as well as pictures trying to explain how this wiring concept works). In addition at the bottom a user demonstrates how they make their own excel sheet for making the wiring mapping, which hopefully helps you a but.

You could use

plot_probe(probe, with_contact_id=True, with_device_index=True)

In order to see how you have things labeled. Could you post that for us to take a look at?

kshtjkumar commented 9 months ago

we use omnetics connector and connect it to the intan head stage. The channels are identified and only those channels are active while recording. So We have a mapping of what omnetics connector is mapped to the intan headstage. The raw channel ids are the ones that I shared above. Let me try the above code line.

kshtjkumar commented 9 months ago

Based on the documentations i edit the probe setting up code to this @zm711 , does this look good ?:

probe = generate_multi_columns_probe(num_columns=3,
                                     num_contact_per_column=[1, 2, 1],
                                     xpitch=100, ypitch=100, y_shift_per_column=[0, -50, 0],
                                     contact_shapes='circle', contact_shape_params={'radius': 12})

plot_probe(probe, with_contact_id=True)
channel_indices = np.arange(4)
probe.set_device_channel_indices(channel_indices)
print(probe.device_channel_indices)
plot_probe(probe, with_contact_id=True, with_device_index=True)
recording_rhs.set_probe(probe, in_place = True)
recording_rhs.get_channel_locations()

zm711 commented 9 months ago

That looks like a nice tetrode, but all we need to confirm is that those are the actual mappings between the device and the data matrix. So you know for that the left contact (dev0) is actually on your tetrode in that location? For example your data should be a matrix (num_samples, num_channels). So to get the 1st channel I would do matrix[:, 0] (because python is 0 indexed). And based on your plot above the 0's channel is the left most point, medium height point on the tetrode. If that is true for your recording set-up then your tetrode is fine, but if that left contact is actually the 4th channel then your set-up would be wrong.

Often the mapping for omnetics means that the numbers aren't a simple np.arange but often it is something more like [1, 3,2, 0]. So really we just need to confirm that you know which contact on your probe goes to which place via the omnetics connector.

For some example mappings for Cambridge Neurotech see here. But what you'll likely notice right away is that these mapping appear to be a random order of numbers but they are not. They are the exact order of electrode contacts and data matrix. Since you're not using a commercial setup you have to make the wiring between contact to data matrix and this is really the most crucial step. (And this is why I really want to make sure you do this right, so that you analyze your data correctly).

As another note, is there a reason you want to start at -50? You could just have your deepest contact be 0 and have your top contact be 100 instead? It's all rather arbitrary, but when you use plotting functions if you're trying to think about the depth of your probes it might be nicer to not have to remember an offset of 50 and just have the offset be 0.

kshtjkumar commented 9 months ago

Hi @zm711 for the tetrode this seems fine, the channel indicated here as dev0 is connected to the channel B-000 and then dev1 is to B-006 etc. However could you guide me if this was not the case then whats to do ( this will help in the future) this is what the mapping for our 32 channel look like :

kshtjkumar commented 9 months ago

@zm711 could you please help with the rest of the code if the probe code looks fine?

zm711 commented 9 months ago

@kshtjkumar, sorry this fell of my radar. I think for deeper probe based questions it is probably better to open the issue on the probeinterface issue tracker. So if you want more help on the probe feel free to open an issue on that tracker so we can go through the finer working of probes. But if the probe generation is wrong the data will look bad (because the sorter likely won't sort correctly).

As far as the rest of the code:

There are some naming issues. You call it sorting_ks, but some of this is Kilosort and some is MountainSort5. For lower channel count I believe Mountainsort4 would be better so you may want to go to those repos and read up to decide if they are better sorters for your purposes.

Also some of these sorters already run their own preprocessing internally so although it is recommended to do things like bandpass filters before waveform extracting is there a reason you are feeding the bandpassed filtered data to the sorter? I'm in the US where the line is 60Hz just want to make sure you are in a part of the world where the line is 50Hz for your notch filter. Also some versions of the intan software already apply a notch filter. Do you know if your data was already notch filtered by the machine.

run_mountsort5 is deprecated. You should use the run_sorter function in general and specify the sorter from there.

I'm just having a hard time parsing your script since some of the variables are named the same. But could you try

sw.plot_amplitudes(we, plot_histograms=True)

and as well as

w_ts = sw.plot_traces(rec_ecog) #make sure to have the correct backend looks like you're on juypter

Could we start with those graphs and see if they produce anything?

kshtjkumar commented 8 months ago

Hi @zm711, I am sorry for the confusion, I have cleared some variables here:

probe = generate_multi_columns_probe(num_columns=3,
                                     num_contact_per_column=[1, 2, 1],
                                     xpitch=100, ypitch=100, y_shift_per_column=[0, -50, 0],
                                     contact_shapes='square', contact_shape_params={'width' : 20 , 'height' :20 })
channel_indices = np.arange(4)
probe.set_device_channel_indices(channel_indices)
print(probe.device_channel_indices)
recording_rhs.set_probe(probe, in_place = True)
recording_rhs.get_channel_locations()
plot_probe(probe, with_contact_id=True, with_device_index=True)

recording_ecog = spre.bandpass_filter(recording_rhs, freq_min=300, freq_max = 6000 ) #bandpass filter
recording_notch_ecog = spre.notch_filter(recording_ecog,q = 50)  #notch_filter 
rec_ecog_ref = spre.common_reference(recording_notch_ecog, operator="median", reference="global")  #rereferencing the data 
ss.run_sorter("mountainsort5", rec_ecog_ref)
print("Sorter found", len(sorting_rec.get_unit_ids()), "units")
sorting_rec = sorting_rec.remove_empty_units()
print("Sorter found", len(sorting_rec.get_unit_ids()), "non empty units")

rec_ecog_ref.annotate(is_filtered=True) # since we filtered the rec 
we = si.extract_waveforms(rec_ecog_ref, sorting_rec, folder="waveforms_kshtj0494416596", load_if_exists=False, overwrite=True )
spost.compute_spike_amplitudes(we)
sw.plot_amplitudes(we, plot_histograms=True)

w2 = sw.plot_timeseries({"Filtered recording": rec_ecog_ref}, time_range=(0,recording_rhs.get_total_duration()),clim = (-50, 50),channel_ids=recording_rhs.channel_ids[:], order_channel_by_depth= True, show_channel_ids=False,backend='ipywidgets' )

For my version there was no sw.plot_traces, so i used sw.plot_timeseries.

I am filtering the signals so as it removes the noises before pushing it in the mountainsort5 sorter. In my country the line noise is at 50Hz, my intan version records raw data, so filtering makes it better i guess.

@zm711 Can you tell me if untill here we are ok ? and how to go further to evaluate every parameter of our recorded data ?

zm711 commented 8 months ago

So for generating your probe, there's one small thing you can change for square you only need to specify width, because the square as the same values. The height keyword is for rect shape.

For the notch filter that makes sense to prenotch the data, especially if you have bad line noise. The bandpass is already done by ms5, but if your recording is super noisy than I can understand trying to clean it up as much as possible. So yeah up to point it looks like you're in okay shape (again, I think opening an issue in probeinterface to really carefully go through the wiring could be beneficial, but it seems okay--the issue being if you link up the wrong channels then a spike detected on one of your channels, might not actually be close to any of your other channels and so the waveforms would look bad even though other metrics would look okay--this happened in your first image which was my original concern).

The other thing that might be helpful for you is to got the ms5 repo and read about how the sorting works. ms5 currently gives you 3 scheme options (spikeinterface defaults to scheme2), but for your data it may be better to use a different scheme to get better sorting results. If you read the ms5 docs or open an issue there they could potentially help you figure out the best way to use their sorter ms5.

Moving on during your extract_waveforms I would need to know your version of SI, because the default was switched to automatically do sparse, but for your tetrode you likely don't want to be sparse so we would want to make sure your extractor is dense. Could you do:

we.is_sparse()

and see what that returns.

Otherwise if all those other steps worked then you just have to pick what your cutoffs are (ie you should read the qc docs and see which work well with tetrodes and then create cutoffs based on your preferred metrics).

tl;dr

The code looks okay for sorting as long as the correct tetrode channels are being fed in. Not all qc will work great for tetrodes to read up on the qc metrics you actually want to use. Your waveforms look a little strange, but your amplitude distribution looks fine (albeit a little small).

kshtjkumar commented 8 months ago

@zm711

we.is_sparse()
False

By qc docs do you mean the quality metrics doc? if not Can you please point me to it.

" you just have to pick what your cutoffs are" by this do you imply which clusters to reject based on isi_violations and other things ?

My si version is 0.98.0.dev0

zm711 commented 8 months ago

You can start with the qc docs associated with spikeinterface here, but it is probably actually best to go to the ref section of that and read the original papers to see how they conceptualized their analyses to make sure they are appropriate for tetrodes.

Unfortunately qc metrics don't have any hard and fast rules (other than you should apply some qc metrics). So what I meant was pick some of the qc metrics that SI offers (like isi_violations) and determine the cutoffs that fit with your data. For instance people have used isolation distance as a metric in the literature, but I have seen values range from cutoff of 10-100 depending on the paper, so it is less that there is a hard cutoff and more that you need to decide what cutoff you will use. I think the sortingQuality has a nice paragraph explanation on how to use qc metrics in general.

Okay cool thanks for the version number. We are onto 0.100.0.dev0, but one big change that happened in 0.99.0 was a shift of having defaults be sparse for extraction rather than dense. But whether you want sparse or dense is going to be analysis specific as well.

Check out those qc docs and if more questions come up feel free to post them.

kshtjkumar commented 8 months ago

hi @zm711 , so i cudnt install mountainsort4, the issues is still open in the relevant repository. I updated spikeinterface to '0.99.1' Doing the same steps give me two issues, 1st while running; sorting_rec = ss.run_sorter("mountainsort5", rec_ecog_ref, output_folder='mountainsort5_output1' ) I have to mention each time the output folder, earlier i didnt do this , is there a way to over write it? I t wud be nice not to make so many folders and rename each in every run. second , now once its all sorted, i run : we.is_sparse() it returns True, and i dont think we want that for tetrode right? Can I make it dense ? Other than that when i run :

unit_ids = sorting.unit_ids
sw.plot_unit_templates(we, unit_ids=unit_ids, ncols=5)

the output is weird:

I am not getting waveforms on all channels as i got before!

Can you please guide where am I going wrong here ?

zm711 commented 8 months ago

I have to mention each time the output folder, earlier i didnt do this , is there a way to over write it?

So unfortunately yes for the moment. The code base is working on adding overwrite options to most of the functions, but not everything has an overwrite yet. Sorry!

Can I make it dense ?

You may or may not want want sparse waveforms for your tetrode. For instance if you had a tetrode where each contact was 100 um away then it might be more similar to 4 monotrodes rather than a true tetrode. If your tetrode contacts are all within 10 um then even with the default parameters the sparsity shouldn't change too much in that case. But you can run your si.extract_waveforms(...., sparse=False) and that will return dense waveforms. It used to be that sparse had a default of False, but this default was switched to True for more recent versions since it is more common for people to use high-density MEAs these days vs tetrodes.

Because you have sparse waveforms your figures might look different. I would do redo with sparse=False and see what that figure looks like when you go back to dense waveforms with the more recent version of spikeinterface (0.99.1).