csn-le / wave_clus

A fast and unsupervised algorithm for spike detection and sorting using wavelets and super-paramagnetic clustering
124 stars 65 forks source link

Selecting parameters to mitigagte 'Error: Less than 15 spikes detected' #211

Open Pablo-GDT opened 2 years ago

Pablo-GDT commented 2 years ago

Hi,

I am trying to feed my synthetic data of bursting neurons into wave_clus. A demo simulation of two neurons which clearly show more than 15 spikes is shown below: image

Although I get the error

Error using wave_clus>load_data_button_Callback
Less than 15 spikes detected

Error in gui_mainfcn (line 95)
        feval(varargin{:});

Error in wave_clus (line 63)
    gui_mainfcn(gui_State, varargin{:});

Error while evaluating UIControl Callback.

Here are my current parameters:

% LOAD PARAMS
par.segments_length = 2; % length (in minutes) of segments in which the data is cutted (default 5min).
par.sr = 7066; % sampling rate (in Hz). This parameter will be only used if the data file don't have a sr.

% PLOTTING PARAMETERS
par.cont_segment = true;
par.max_spikes_plot = 1000; % max. number of spikes to be plotted
par.print2file = true; % If is not true, print the figure (only for batch scripts).
par.cont_plot_samples = 100000; % number of samples used in the one-minute (maximum) sample of continuous data to plot.
par.to_plot_std = 1; % # of std from mean to plot
par.all_classes_ax = 'mean'; % 'mean'/'all'. If it's 'mean' only the mean waveforms will be ploted in the axes with all the classes par.plot_feature_stats = false;

% SPC PARAMETERS
par.mintemp = 0.00; % minimum temperature for SPC
par.maxtemp = 0.251; % maximum temperature for SPC
par.tempstep = 0.01; % temperature steps
par.SWCycles = 100; % SPC iterations for each temperature (default 100)
par.KNearNeighb = 11; % number of nearest neighbors for SPC
par.min_clus = 20; % minimum size of a cluster (default 60)
par.randomseed = 0; % if 0, random seed is taken as the clock value (default 0)
%par.randomseed = 147; % If not 0, random seed
%par.temp_plot = 'lin'; % temperature plot in linear scale
par.temp_plot = 'log'; % temperature plot in log scale

par.c_ov = 0.7; % Overlapping coefficient to use for the inclusion criterion.
par.elbow_min = 0.4; %Thr_border parameter for regime border detection.

% DETECTION PARAMETERS
par.tmax = 'all'; % maximum time to load
%par.tmax= 180; % maximum time to load (in sec)
par.tmin= 0; % starting time for loading (in sec)
par.w_pre = 20; % number of pre-event data points stored (default 20)
par.w_post = 44; % number of post-event data points stored (default 44))
par.alignment_window = 10; % number of points around the sample expected to be the maximum
par.stdmin = 3.5; % minimum threshold for detection
par.stdmax = 500; % maximum threshold for detection
par.detect_fmin = 300; % high pass filter for detection
par.detect_fmax = 3000; % low pass filter for detection (default 1000)
par.detect_order = 4; % filter order for detection. 0 to disable the filter.
par.sort_fmin = 300; % high pass filter for sorting
par.sort_fmax = 3000; % low pass filter for sorting (default 3000)
par.sort_order = 2; % filter order for sorting. 0 to disable the filter.
par.ref_ms = 1.5; % detector dead time, minimum refractory period (in ms)
par.detection = 'both'; % type of threshold ('pos','neg','both')
% par.detection = 'neg';
% par.detection = 'both';

% INTERPOLATION PARAMETERS
par.int_factor = 5; % interpolation factor
par.interpolation = 'y'; % interpolation with cubic splines (default)
% par.interpolation = 'n';

% FEATURES PARAMETERS
par.min_inputs = 10; % number of inputs to the clustering
par.max_inputs = 0.75; % number of inputs to the clustering. if < 1 it will the that proportion of the maximum.
par.scales = 4; % number of scales for the wavelet decomposition
par.features = 'wav'; % type of feature ('wav' or 'pca')
%par.features = 'pca'

% FORCE MEMBERSHIP PARAMETERS
par.template_sdnum = 3; % max radius of cluster in std devs.
par.template_k = 10; % # of nearest neighbors
par.template_k_min = 10; % min # of nn for vote
%par.template_type = 'mahal'; % nn, center, ml, mahal
par.template_type = 'center'; % nn, center, ml, mahal
par.force_feature = 'spk'; % feature use for forcing (whole spike shape)
%par.force_feature = 'wav'; % feature use for forcing (wavelet coefficients).
par.force_auto = true; %automatically force membership (only for batch scripts).

% TEMPLATE MATCHING
par.match = 'y'; % for template matching
%par.match = 'n'; % for no template matching
par.max_spk = 40000; % max. # of spikes before starting templ. match.
par.permut = 'y'; % for selection of random 'par.max_spk' spikes before starting templ. match.
% par.permut = 'n'; % for selection of the first 'par.max_spk' spikes before starting templ. match.

% HISTOGRAM PARAMETERS
par.nbins = 100; % # of bins for the ISI histograms
par.bin_step = 1; % percentage number of bins to plot

The sample rate and segment length are inferred from the simulation with a bursting neuron's period lasting approximately 20 seconds the number of samples is scaled to Hz. Since there are 6 periods for the first neuron above and 875000 samples: 875000/120 = 7291.6. I don't know if this arbitrary way of defining these values is too imprecise.

It potentially has to do with other parameters. For example, how does one establish adequate values for par.stdmin and par.std max? Also, how do these thresholds directly contribute to spike detection; is it a time measure in ms of spike's duration? Apologies, I am just desperate to have it working on my data because my dissertation depends on it.

ferchaure commented 2 years ago

It looks like the sampling rate is a bit to low, suppose that the really informative part of an spikes is around 1.5 ms, with this sr, they are 12 samples in vectors of 64 samples (par.w_pre + par.w_post ). You have more noise than information in each spikes, wave_clus won't work well.

par.stdmin and par.std_max are based on a standard deviation of the gaussian noise, under that hipotesis you can estimate the number of False positive that you can detect using par.stdmin. par.std_max is a bit more complicated, you don't have artifacts or noise then I would use a massive number (like you did) of just inf. In you case, for just bursting neurons that could be complicated, because is impossible to approximate the noise level, Maybe you could reduce a bit more par.stdmin.

How much time do you have between spikes? wave_clus won't handle overlapping spikes and if that is happening all the time will reduce the detections significantly (try increasing par.ref_ms).

Pablo-GDT commented 2 years ago

Hi Fernando, thanks for your timely response! I am adding Gaussian noise to the signal after calculating the electrode potential. I know this isn't physiologically equivalent to real recordings as Rodrigo points out, but for simplicity I have implemented it this way. Right now that the noise it is set to one standard deviation of the signal. This is something I can relax as required. If there is a heuristic to choosing an appropriate Gaussian distribution that would challenge wave_clus but still produce detectable results then I can apply this as required.

If I remove noise completely from the same simulation (as shown below) I get the same error of 'Less than 15 spikes detected' and with each burst with more than 15 spikes alone, something is clearly going wrong.

image Would a dramatic increase in par.w_pre and par.w_post give better chances of detection or do these need to be fined tuned especially? I've tried the noiseless signal with these parameters, decreasing par.stdmin. to 0.1, 0.5, 0.8, 0.9, 1, 1.5 and increasing increasing par.ref_ms.

% LOAD PARAMS
par.segments_length = 2; % length (in minutes) of segments in which the data is cutted (default 5min).
par.sr = 7291; % sampling rate (in Hz). This parameter will be only used if the data file don't have a sr.

% PLOTTING PARAMETERS
par.cont_segment = true;
par.max_spikes_plot = 1000; % max. number of spikes to be plotted
par.print2file = true; % If is not true, print the figure (only for batch scripts).
par.cont_plot_samples = 100000; % number of samples used in the one-minute (maximum) sample of continuous data to plot.
par.to_plot_std = 1; % # of std from mean to plot
par.all_classes_ax = 'mean'; % 'mean'/'all'. If it's 'mean' only the mean waveforms will be ploted in the axes with all the classes par.plot_feature_stats = false;

% SPC PARAMETERS
par.mintemp = 0.00; % minimum temperature for SPC
par.maxtemp = 0.251; % maximum temperature for SPC
par.tempstep = 0.01; % temperature steps
par.SWCycles = 100; % SPC iterations for each temperature (default 100)
par.KNearNeighb = 11; % number of nearest neighbors for SPC
par.min_clus = 20; % minimum size of a cluster (default 60)
par.randomseed = 0; % if 0, random seed is taken as the clock value (default 0)
%par.randomseed = 147; % If not 0, random seed
%par.temp_plot = 'lin'; % temperature plot in linear scale
par.temp_plot = 'log'; % temperature plot in log scale

par.c_ov = 0.7; % Overlapping coefficient to use for the inclusion criterion.
par.elbow_min = 0.4; %Thr_border parameter for regime border detection.

% DETECTION PARAMETERS
par.tmax = 'all'; % maximum time to load
%par.tmax= 180; % maximum time to load (in sec)
par.tmin= 0; % starting time for loading (in sec)
par.w_pre = 20; % number of pre-event data points stored (default 20)
par.w_post = 44; % number of post-event data points stored (default 44))
par.alignment_window = 10; % number of points around the sample expected to be the maximum
par.stdmin = 1.5; % minimum threshold for detection
par.stdmax = inf; % maximum threshold for detection
par.detect_fmin = 300; % high pass filter for detection
par.detect_fmax = 3000; % low pass filter for detection (default 1000)
par.detect_order = 4; % filter order for detection. 0 to disable the filter.
par.sort_fmin = 300; % high pass filter for sorting
par.sort_fmax = 3000; % low pass filter for sorting (default 3000)
par.sort_order = 2; % filter order for sorting. 0 to disable the filter.
par.ref_ms = 1.5; % detector dead time, minimum refractory period (in ms)
par.detection = 'both'; % type of threshold ('pos','neg','both')

% INTERPOLATION PARAMETERS
par.int_factor = 5; % interpolation factor
par.interpolation = 'y'; % interpolation with cubic splines (default)
% par.interpolation = 'n';

% FEATURES PARAMETERS
par.min_inputs = 10; % number of inputs to the clustering
par.max_inputs = 0.75; % number of inputs to the clustering. if < 1 it will the that proportion of the maximum.
par.scales = 4; % number of scales for the wavelet decomposition
par.features = 'wav'; % type of feature ('wav' or 'pca')
%par.features = 'pca'

% FORCE MEMBERSHIP PARAMETERS
par.template_sdnum = 3; % max radius of cluster in std devs.
par.template_k = 10; % # of nearest neighbors
par.template_k_min = 10; % min # of nn for vote
%par.template_type = 'mahal'; % nn, center, ml, mahal
par.template_type = 'center'; % nn, center, ml, mahal
par.force_feature = 'spk'; % feature use for forcing (whole spike shape)
%par.force_feature = 'wav'; % feature use for forcing (wavelet coefficients).
par.force_auto = true; %automatically force membership (only for batch scripts).

% TEMPLATE MATCHING
par.match = 'y'; % for template matching
%par.match = 'n'; % for no template matching
par.max_spk = 40000; % max. # of spikes before starting templ. match.
par.permut = 'y'; % for selection of random 'par.max_spk' spikes before starting templ. match.
% par.permut = 'n'; % for selection of the first 'par.max_spk' spikes before starting templ. match.

% HISTOGRAM PARAMETERS
par.nbins = 100; % # of bins for the ISI histograms
par.bin_step = 1; % percentage number of bins to plot

The time between spikes would be small assuming the peak is at 3286 and 3265 (in the image below) the crest to crest duration would be 21/(7291/60) * 1000 approximately 172.8 ms (I think) if this is at all helpful.
image Do you have any suggestions?

ferchaure commented 2 years ago

Hi, using signal without noise is tricky because the threshold will be close to zero, but a really high stdmin could help.

Neither par.w_pre or par.w_post afect the detection.

It looks like the time between spikes is around 3ms, for that reason the refractory period could be problematic.

And sorry my mistake, you have to reduce ref_ms not increasing it.

Pablo-GDT commented 2 years ago

Since too little noise and too much noise does not help detection, I have set the Gaussian noise below to half a standard deviation of the signal.

image

I think I made a mistake earlier with the estimation of the spike duration. There are 100 samples within each unit of time pictured above. Taking a whole spike to equate to 12 units like that below image gives (units sampling rate per unit/ sampling rate (Hz) ) ms conversion) 12100/7291 1000 = 164.58ms so a spike should be taking a lot longer than previously stated. However, I've set par.ref_ms = 100; and par.stdmin = 1.5; and its still not detecting spikes. Could it still be that refractory time is an issue?

Since I have not ever used this software I do not know what would be classified as reasonable input to wave_clus to get successful output. I also don't know how sensitive the software is to parameter values. Has wave_clus demonstrated success with synthetic data and is there any other resources that may help me out?

ferchaure commented 2 years ago

I've set par.ref_ms = 100; and par.stdmin = 1.5; and its still not detecting spikes. Could it still be that refractory time is an issue?

I wrote my mistake before, but you have to reduce the refractory to par.ref_ms = 1 or something like that. And par.stdmin = 1.5 may work for high noise, but for low noise you need a higher value.

I also don't know how sensitive the software is to parameter values

If you have use another spikesorter the detection is quite similar in all of them, been par.stdmin the main parameter and for real cases going from 3.5 to 4.5 depending the amount of false positive you want to handle

Has wave_clus demonstrated success with synthetic data and is there any other resources that may help me out?

Yes, for example the Easy, Difficult, etc simulations made by Rodrigo and a software called Neurocube to create new ones, both have been used in wave_clus a lot.

I would recommend you to add a breakpoint in the function Batch_files/amp_detect.m to see how the signal looks after the filtering and the threshold used. Or even manually force a threshold there for the unrealistic examples

Pablo-GDT commented 2 years ago

Hi Fernando,

I have a couple of questions about wave clus I was hoping you could help resolve. Firstly, do you happen to know why I may be getting this error when I run wave_clus (I'm using MATLAB R2022a): ` Wave_clus didn't find a sampling rate in file. It will use the input parameter or set_parameters.m Unable to perform assignment because the indices on the left side are not compatible with the size of the right side.

Error in amp_detect (line 75) index(nspk) = iaux + xaux(i) -1;

Error in wave_clus>load_data_button_Callback (line 221) [new_spikes, temp_aux_th, new_index] = amp_detect(x, handles.par);

Error in gui_mainfcn (line 95) feval(varargin{:});

Error in wave_clus (line 63) gui_mainfcn(gui_State, varargin{:});

Error while evaluating UIControl Callback. ` Another question I have is whether it's possible for wave_clus to return the time/array element in which a spike is detected and which cluster it corresponds to?

Thanks

ferchaure commented 2 years ago

Hola Pablo,

That error is really odd, what type of file are you loading? waveclus is the latest package added to your path?

Another question I have is whether it's possible for wave_clus to return the time/array element in which a spike is detected and which cluster it corresponds to?

Probably you are asking about the times_NN file created if you save the sorting? check the output files page in the Wiki

Pablo-GDT commented 2 years ago

Hi Fernando.

thanks for getting back to me so soon!

I am uploading .mat files like those I have attached in the zip folder consisting of a single 1xN sample vector named data. I thought this was the only constraint necessary when loading the data.

test_data_file.zip

Am I missing something?

ferchaure commented 2 years ago

The format is fine, it looks like a conflict with other package. I would recommend to add a breakpoint in amp_detect.m and check what is happening with the standard functions.

index(nspk) = iaux + xaux(i) -1;) That line extend the vector index, using a counter nspk (an integer) with a value equal to the second output to the max function (iaux , another integer) plus another vector (xaux the output of the find function ) indexed with a counter (i). If something behaves different then check if the function has been replaced

Pablo-GDT commented 2 years ago

As seen in the sc below, the amp_detect function seems to run but looks like iaux and i are empty vectors. Could this be why the problem is occurring? I don't recall ever looking through amp_detect.m much less changing anything inside of it.

image

ferchaure commented 2 years ago

ok, Then the issue arises because par.ref_ms is less than two samples. For your sampling rate it should be at least 0.28. Otherwise when waveclus search for half refractory period it gets an empty vector.

Pablo-GDT commented 2 years ago

Hi Fernando

for clarification must par.ref_ms be set to at least 2*1000/par.sr to satisfy the Nyquist frequency? because wave_clus worked when I set it par.ref_ms to 28.

ferchaure commented 2 years ago

No, that's not related to the Nyquist frequency. Is just the way waveclus search for the peaks after the threshold crossing.

Pablo-GDT commented 2 years ago

Saludos Fernando!

Perdon, pero no me habia fijado que usted es de Argentina! Me alegro mucho que haya una buena communidad de cientificos hispanohablantes en esta área de investigación. Bueno, te commento en ingles que me da mejor para estas cosas espero que no le importe.

I don't know how much experience you have working with synthetic data and this may not be the appropriate forum to ask so forgive me. However, I wanted to ask if you know how time resolution is defined in dynamic systems like the Morris-Lecar model I am using given by Erhmentrout and Terman:

image

With a constant applied current, I_app, the system is autonomous but it's not clear to me what timescale to consider the signal produced; should it be ms, s... or is this undefined and up to the discretion of the user?

Thanks

ferchaure commented 2 years ago

Hola, Pablo

Synthetic data is out of my expertise, I would say that the units of the solution is given by the units of the parameters and you must discretize to at least 30k samples per second, but maybe these aren't your questions.

I would recommend you to check MEArec, that one is my favorite simulation package, maybe reading about it will help you or even Alessio (its developer) could give you a hand.

Pablo-GDT commented 2 years ago

Hi Fer,

You said par.pre and par.post don't affect the detection but do I have to vary them if I change anything like my sampling rate because they don't seem to show a full waveform in the cluster template. image but running the same parameters but tweaking par.pre and par.post now show the full waveform: image

These are the 64 samples that are 'saved for further analysis' right so they're pretty important to tweak right? Is there any overall method you to adjust these?

ferchaure commented 2 years ago

I wold increase par.ref_ms, check that each spike is not align in a way that the peak is in the sample par.pre. par.pre and pos parameters aren't important for detection but they are fundamental for feature extraction.

If I understand correctly the plots, a positive threshold should be better for you, because the spikes high positive than negative values. If you change that you will have to change as well the par.pre and pos parameters.

Pablo-GDT commented 2 years ago

Hi Fer, sorry for the constant messaging

Firstly I wanted to understand why you said the following in your previous comment:

wold increase par.ref_ms, check that each spike is not align in a way that the peak is in the sample par.pre.

Secondly, I wanted to get your thoughts on the sorting seen below and if you think the parameters are appropriate for this case of real data: image

and the case of synthetic data with what looks like too much pink noise: image Could it be that this pink noise adds too much nonstationary information which makes it hard to find reasonable templates?

ferchaure commented 2 years ago

About par.ref_ms, waveclus will search for a peak in the waveform for around half par.ref_ms. For that reason if the value is too small, it won't search for the peak for enough samples and won't align well with the peak.

I would use a negative threshold for the real data (I don't think that are spikes, they are to short, at least for 25-30kHz).

In the synthetic data, you have to much noise. In a real case it should be possible to distinguish visually some spikes, but in that case the noise usually takes a similar amplitude.

Pablo-GDT commented 2 years ago

Hi Fer,

I was wondering how the times for each spike in the times_[filename].mat file are aligned? Is it aligned on the spike peaks or could it be affected by par.w_pre?

Also Im experiencing a stange phenomena where changing the sampling rate drastically affects the signal seen in the GUI. For example here with a sr of 9900 the spikes are well defined against the noise and have a much greater amplitude: image

Here with a sampling rate of 10x: 99000 the spikes are indistinguishable from the noise: image Note this is unrelated to plotted samples since I have set this to par.cont_plot_samples = 8000000; which is much higher than the 873675 samples used in this 8s recording and the amplitude of the noise has not been changed.

Do you know why this might be happening?

ferchaure commented 2 years ago

I was wondering how the times for each spike in the times_[filename].mat file are aligned?

To the peak

Is it aligned on the spike peaks or could it be affected by par.w_pre?

Is align to the peak, visually should be the sample number par.w_pre in the extracted spikes

About the other question, the parameter par.cont_plot_samples is used to downsample the one minute example of signal. I'm not sure what you want to do but notice that using 64 samples for a recording done at 99KHz is just 0.5ms (an small fraction of a real spike wavefom).

Pablo-GDT commented 2 years ago

Hi Fer

thanks for that, yeah I was wondering about the sample number as you rightly indicated.

I wanted to know if 'pca' as the feature extraction method is still supported because I get the error: ''princomp' has been removed. With appropriate code changes, use 'pca' instead.'. I read that this has been depreciated with newer versions of Matlab like my 2022a ver. Is the only solution to download an earlier Matlab version?

Thanks!

ferchaure commented 2 years ago

Hi Pablo, I just remember this issue. And I guess your don't have the required toolbox with the Principal Component Analysis.

[post edited]