csn-le / wave_clus

A fast and unsupervised algorithm for spike detection and sorting using wavelets and super-paramagnetic clustering
124 stars 65 forks source link

Question about non compatible data #132

Closed ASfst closed 4 years ago

ASfst commented 5 years ago

Hi, I am a new to spike sorting and I would really appreciate if you could answer some questions that I have.

I use the Multiwell MEA system from Multichannel Systems for neuronal recording in vitro (24 wells and each one has 12 electrodes). I have already extracted the spikes in file with the extention .h5. Is it possible to use only the 'do_clustering' algorthim? If yes which is the compatible type of file that my data should be.

ferchaure commented 5 years ago

I don't know the particular implementation of your .h5 files, are they just the detected spikes?

ASfst commented 5 years ago

Yes, they are the waveforms of the detected spikes. Each waveform contains 31 samples of the voltage signal (μV) (amplitudes of signal). The 11th sample is the amplitude of the spike that overpassed a predefined threshold.


Από: Fernando J. Chaure notifications@github.com Στάλθηκε: Τετάρτη, 26 Ιουνίου 2019 2:31 πμ Προς: csn-le/wave_clus Κοιν.: ASfst; Author Θέμα: Re: [csn-le/wave_clus] Question about non compatible data (#132)

I don't know the particular implementation of your .h5 files, are they just the detected spikes?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/csn-le/wave_clus/issues/132?email_source=notifications&email_token=AMOK6QQ3WAY5PHKLESA4CELP4KTFRA5CNFSM4H3JZLHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYR4EIQ#issuecomment-505659938, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMOK6QSRLIK2UCXD5UJISS3P4KTFRANCNFSM4H3JZLHA.

[https://ipmcdn.avast.com/images/icons/icon-envelope-tick-round-orange-animated-no-repeat-v1.gif]https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail Απαλλαγμένο από ιούς. www.avast.comhttps://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail

ferchaure commented 5 years ago

are they well aligned?

You could make a *_spikes.mat file with a variable spikes (nr. of spikes x length of the spike shape) and a vector index with the spike times (in ms).

ASfst commented 5 years ago

Yes, they are. I use the matlab toolbox provided by multichannel system (the company that sales the whole recording system) in order to extract the cutouts of spikes.

Thank you so much for your help. I ll try to do what you suggested me.

ferchaure commented 5 years ago

Probably you could get better results extracting using that software 36 samples long spikes (32 and extra 2 in each side) making a similar file (without the _spikes in the name) , and then calling Get_spikes and the Do_clustering. But I guess that what you did could get you some approximate result.

ASfst commented 5 years ago

I have this erros "Wave_clus didn't find a sampling rate in file. It will use the input parameter or set_parameters.m" I saved both the variable 'spikes' and the vector containing spike times in the mat file. Am I doing correctly?

ferchaure commented 5 years ago

Did you added _spikes.mat at the end of the filename? In any case that is not a error just a warning that the sampling rate in the set_paramaters() function will be used.

ASfst commented 5 years ago

What did you mean at the end of the filename?

This is the warning : 'Wave_clus didn't find a sampling rate in file. It will use the input parameter or set_parameters.m Warning: File: _spikes.mat doesn't include spikes

In Do_clustering>do_clustering_single (line 504) In Do_clustering (line 170) 0 of 1 'times' files finished. Computations Done (0.02s). Creating figures... Wave_clus didn't find a sampling rate in file. It will use the input parameter or set_parameters.m

Elapsed time is 0.185598 seconds.'

ferchaure commented 5 years ago

The name of the file must be: something_spikes.mat

ferchaure commented 5 years ago

If you want, upload a little file with some samples and I will see exactly what is the issue with the files

ASfst commented 5 years ago

Yes ofcourse. In the .mat file you can find the spikes detected by one electrode and their timestamps. _spikes.zip

ferchaure commented 5 years ago

a few things to fix:

ASfst commented 5 years ago

Thanks for helping me. I would like to ask you something more. Although there is an explanation about the new temperature plot, I could not find any information about the ISI(interspike interval distribution) plots. Could you please explain me what I exactly see in this graps under of each cluster. Moreover, does the cluster 0 contain waveforms of spikes that are not matched to anyone cluster from 1 to 4? Finally the parameter N_inc (paper) is 20??

fig2print_SR50 fig2print_SR50a

ferchaure commented 5 years ago

The isi plot is the histogram of t_n - t_(n-1) been t_n the times of the spikes number n in that class. The clustering doesn't look right. Probably you could get better results if you reduce the number of samples of the spikes the first 20 and a few at the end are useless. And specially they don't look well aligned, change the name of the file with the spikes (remove the _spikes part) and in the set_parameters file change :

par.w_pre = 40; 
par.w_post = 88; 
par.detection = 'neg';

and call Get_spikes(filename). It will create a new filename_spikes file with the spikes well aligned. Finally call Do_clustering with that file as the input.

About the questions:

ASfst commented 5 years ago

The file that I ll insert into Get_spikes(filename) what should include? The spikes and their timestamps with the name (index)?

ASfst commented 5 years ago

Could you also explain me what is the par.detection? The spikes waveforms that I extract, sometimes they are positive and some other are negative. Is the par.detection releated to this?

ferchaure commented 5 years ago

The new file should include index, sr (sampling rate in Hz, optional), and spikes. The par.detection parameter is needed for the final alignment step (after interpolate set the positive/negative/absolute peak to the sample par.w_pre ), if you have spikes with both type of polarities just use par.detection = 'both'

ASfst commented 5 years ago

fig2print_SR50 I think in cluster one something is going wrong. What's your opinion? Additionally, how did you understand that I have overlapping spikes?

ferchaure commented 5 years ago

Did you changed the parameters as well? Another method is create the struct par, define the parameters you want to use inside and then call: Get_Spikes(filename, 'par', par). You should change as well: par.alignment_window = 40 (the alignment of the spikes are a mess)

You can see overlapping spikes as waveform with two separated peaks

ASfst commented 5 years ago

Yes I did it. Let me explain you what I would like to achieve by using your clustering algorithm. You might give me a better suggestion after my explanation.

I have recorded neuronal activity with 3 different sampling rates (10K, 25K and 50K). Every time that a spike is detected, there is time window of 3 ms that no other spike is detected (this might be an issue as if another spike appears during this 3ms, the system recognises this event, as the detected spike still increases its amplitude and not as overlapped spikes).

The waveform of each spike is composed by the samples of signal 1 ms before the spike event, the spike event and the samples after 2ms. For example recording with 50K, you collect a voltage sample every 20μs, so the waveform is composed by 50 numbers of voltage signal (μV) before spike event, the amplitude of spike (μV) and 100 numbers of voltage signal (μV) after spike event. So 151 samples of voltage signal in μV.

What I want to do is whether I can find an omptimal sampling rate based on the limitations of the system that I use (overlapping issue). After finding the optimal sampling rate, I will continue recoridng with this sampling rate and see the progress of my neuronal cultures by applying the wave_clust algorithm and compare the extracted clusters over time.

Considering tha parameters that you suggested me before : for 10k: par.alignment_window = 10, par.w_pre=10, par.w_post=20 for 25k: par.alignment_window = 25, par.w_pre=25, par.w_post=50 for 50k: par.alignment_window = 50, par.w_pre=50, par.w_post=150
Am I right? Thanks one more time for helping me.

ferchaure commented 5 years ago

It will be better if par.w_pre + par.w_post is a power of 2 (8,16,32,64,128,256,etc). The par.w_post values look a bit higher but I think that the values are more or less fine. The values should be small enough that after alignment you have all the samples needed in the spike segment you get from the acquisition system.

par.alignment_window depends of the detection method you are using, if you see a cluster with a peak in a sample that is not par.w_pre then increase that value.

For my experience, I would say that 10k is too low to a get a decent clustering.

ASfst commented 5 years ago

The par.w_pre is the number of pre-event data points stored, so why should I expect to see a peak in this window? I would expect to see it immediatly after this window. No?

ferchaure commented 5 years ago

The par.w_pre is the number of samples before (and counting) the peak of the spike that are included in the window. For that reason if everything is fine, the peak of the spike is in the sample par.w_pre

ASfst commented 5 years ago

Now it is clear. Thanks :)

ASfst commented 5 years ago

Good Morning Fernando, I have one more question and I would really appreciate if you can help me one more time. I would like to extract in a table the mean waveform of each cluster. Is it possible or should I calculate it on my own. I assume that as the mean waveforms appear in the png file, there is a table that keeps them until the final results are extracted.

Thank you one more time for your help!

ferchaure commented 5 years ago

sorry, we just calculate that on the go. You could check the code and save a table as you want, but probably it's better if you make your own code to load the times file and make the table

ASfst commented 5 years ago

Thanks Fernando!

I already found a way to do that .

Have a nice day!

ASfst commented 5 years ago

Hi, I have one more question about the par.alignment_window. How does it work exactly. Lets assume that the value of par.alignment_window is 40. This means that is looking 40 samples before the event of the spike and 40 after? I do not understand the comment in the code "number of points around the sample expected to be the maximum"

ferchaure commented 5 years ago

par.alignment_window samples in each side of the detection (threshold crossing) are used for searching the local extreme (that later is used to align the spike).

ASfst commented 5 years ago

Hi Fernando, I would like to ask you one more question. During one recording day, one of the ectrodes has detected only 50 spikes (a small number of spikes). When I try to cluster them, the results is the following: fig2print_SR25 I dont understand why the algorithm classifies wevaforms with different peaks (positive and negative) ito the same cluster. I have set the paratemeter par.detection = 'both'; and I also tried to decrease the parameter par.min_clus from 20 to 5, in case that this is the issue. Thanks for helping me.

ASfst commented 5 years ago

2 more questions that I have are the following: What do I exactly see when I press the button of plot all projections? I can understand that it is the projection from the spike shapes into the feature space but are they all the possible clusterings of the waveforms?

The second question is why you have told me that it will be better if par.w_pre + par.w_post is a power of 2 (8,16,32,64,128,256,etc). Which is the mathematical reason?

ferchaure commented 5 years ago

I dont understand why the algorithm classifies wevaforms with different peaks (positive and negative) ito the same cluster.

The fast answer is that the clustering algorithm doesn't have enough spikes to say that the diference in peak is relevant or not. In that cases a manual sorting or using two detection types is probably better. Using par.detection = 'both' is never a good idea, the alignment is better in the other options.

are they all the possible clusterings of the waveforms?

They are just different pairs of projections, the clustering use all of them.

Which is the mathematical reason?

For that cases the discrete wavelet transform doesn't have a border issue (and the zero padding is not necessary) and some wavelets have more "information" about the waveform. Furthermore in that cases we can compute faster the wavelets using a few computational tricks.

ASfst commented 5 years ago

Hi Fernando,

just to be sure that I understood it correclty about the pairs of projections. Are the different pairs of projections of the spikes calculated during the process of automatically selection of wavelet coefficients? Am I right?

ferchaure commented 5 years ago

Not really, the algorithms uses all the dimensions together, not the set of projections.

ASfst commented 5 years ago

Can you please let me know from what the number of projection pairs is depended (when I click the button 'plot all projections' then all the pairs of projections appear. Therefore what is the factor/parameter that would define how many pairs of projections will be displayed/ will appear)?

ferchaure commented 5 years ago

The amount of projections depends of the number of features selected. The best way to understand the feature selection criteria is to read the paper of the current wave_clus

ASfst commented 5 years ago

Hello Fernando,

I have read you paper several times and maybe I do not understand a crucial point. Let me explain you one more time what I am asking.

Lets say that I have an electrode with 2283 detected spikes and each spike has 76 samples. By using the four scale multiresolution decomposition with a Haar wavelet, I will end up with 76 wavelet coefficients for each spike. Then by applying the Lilliefors test I will take the most significant coefficients for each spike.

In my case, when I ran the function "wave_features" in Matlab I ended up with a table called "inspk" which contained 12 coeffecients for each spike. Thus, based on what I have understood so far, I am waiting to see 12 pairs of projections. However when I click the button "plot all projections" I saw 78 pairs of projections. Attached you may see the result.

I am sure that I have missed something improtant. Could you please give me a guidance. image

ferchaure commented 5 years ago

You have 12 coefficient and you want to see the relationship between each pair of them (because you can only see easy 2D graphics). Then, you can plot 144 two dimensional plots (one coefficient in the x-axis and the other in the y-axis) like a matrix of 12x12 as you see. But the diagonal is the same coefficient , and the matrix is symmetrical, etc. You only need to plot the upper left of the matrix.

By the way the plot looks like you have 13 coefficients.

ASfst commented 5 years ago

Thank you Fernando!! You are really helpful! Yes indeed, the coef. were 13.