SpikeInterface / spikeinterface

A Python-based module for creating flexible and robust spike sorting pipelines.

https://spikeinterface.readthedocs.io

MIT License

532 stars 187 forks source link

sortingcomponents.peak_selection.select_peaks doesn't catch when n_peaks is too large #1312

Open pauladkisson opened 1 year ago

pauladkisson commented 1 year ago

Current Behavior

When select_peaks(peaks, n_peaks=n_peaks, ...) receives argument n_peaks >= peaks.shape[0], it returns peaks.

Expected behavior

When select_peaks(peaks, n_peaks=n_peaks, ...) receives argument n_peaks >= peaks.shape[0], it should throw an error stating "Number of selected peaks greater than or equal to number of input peaks" (or something along those lines).

Example Code to Reproduce Error

from pathlib import Path
import spikeinterface.core as sc
import spikeinterface.extractors as se
from spikeinterface.sortingcomponents.peak_detection import detect_peaks
from spikeinterface.sortingcomponents.peak_selection import select_peaks

# Load the Data
basepath = Path("/Volumes/T7/CatalystNeuro")
filepath = basepath / "TutorialDatasets/sub-npI1_ses-20190413_behavior+ecephys.nwb"
recording = se.read_nwb(filepath)
rec10s = recording.frame_slice(start_frame=0, end_frame=recording.sampling_frequency * 10)
print(rec10s)

# Detect Peaks
noise_levels = sc.get_noise_levels(rec10s, return_scaled=False)
job_kwargs = dict(n_jobs=1, progress_bar=True)
peaks = detect_peaks(rec10s, method='by_channel', noise_levels=noise_levels, **job_kwargs)

# Select Peaks
num_detected = peaks.shape[0]
print("# of peaks detected:", num_detected)
selected_peaks = select_peaks(peaks, n_peaks=(num_detected+1), method='uniform',
                              noise_levels=noise_levels, by_channel=False)
num_selected = selected_peaks.shape[0]
print("# of peaks selected:", num_selected)

alejoe91 commented 1 year ago

Thanks @pauladkisson

Actually I think that the behavior is fair..the select peaks is used to limit computation time, so in my mind the n_peaks is a maximum number of peaks.. Otherwise, one might need to change the parameter (sometimes deeply nested in the sorter params), in case of recordings shorter than usual, or with fewer peaks. I think it's good to avoid it. Maybe changing the naming slightly or specify the behavior in the docs would be better.

pauladkisson commented 1 year ago

@alejoe91 I think that kind of use case is best handled with a try/except block around the select_peaks function if one anticipates the number of peaks in peaks to be less than n_peaks. The main use case of select_peaks is to actually select a subset of peaks, so if n_peaks is improperly specified, I think it's appropriate to raise an error (or at the very least a warning).

alejoe91 commented 1 year ago

Yeah I see your point. I would actually vote for a warning, since the goal of this function is to limit the number of peaks to a certain amount and if you have less than you're probably happy with it :)

What @h-mayorquin @samuelgarcia and @yger think?

h-mayorquin commented 1 year ago

@alejoe91 I think that kind of use case is best handled with a try/except block around the select_peaks function if one anticipates the number of peaks in peaks to be less than n_peaks. The main use case of select_peaks is to actually select a subset of peaks, so if n_peaks is improperly specified, I think it's appropriate to raise an error (or at the very least a warning)

Am I understanding well that you are want to use an assertion to inform the user that he has unintendedly chosen a value of n_peaks that is larger than number of peaks that he has? What are some possible bad consequences of us not informing the user of this?

Fixing for this would block a possible use of the function where I run detect_peaks but I want just want to assure myself that downstream computations do not run in more than n_peaks=1000. That is, I want to determine the upper bound of how many peaks should all the other methods handle without knowing in advance how many the first method will get (specially if sometimes select_peaks is non-deterministic). If the detect_peaks method produce less than 1000 peaks, then I am fine with the pipeline movin

h-mayorquin commented 1 year ago

So I think I agree with @alejoe91 that we should probably change the semantics. Select peaks should have an argument like max_n_peaks which would be the upper bound of what the sampling returns.

pauladkisson commented 1 year ago

Summarizing the solution we discussed yesterday:

select_peaks should take two optional arguments to constrain the # of peaks returned/sampled:
- max_n_peaks : ensures that input_peaks has no more than max_n_peaks, if it does --> subsample down to max_n_peaks
- min_n_peaks : ensures that input_peaks has no fewer than min_n_peaks, if it does --> throw error
Potentially rename select_peaks to something more informative based on its function of constraining the number of peaks to some specified range

Something I just thought of while writing this:

different peak selection methods may want to handle the min_n_peaks problem differently, for example:
- Sampling input_peaks with replacement (bootstrapping)
- Performing some data augmentation to generate more peaks (adding low levels of noise, amplitude jitter, etc.)
So, we probably want to assert input_peaks.size >= min_n_peaks in each method separately

CodyCBakerPhD commented 1 year ago

The opinion of this humble outsider, if I may offer it - if you're changing this argument structure I would additionally vote for a more standardized naming scheme that also matches other usage across the repo

Current suggestion - Standardized suggestion max_n_peaks - max_num_peaks min_n_peaks - min_num_peaks

which is then more similar to calls such as

get_num_channels, get_num_units

zm711 commented 5 months ago

@h-mayorquin, this is also over a year old at this point. Is this still on the todo list or should we close for inactivity?

h-mayorquin commented 5 months ago

I don't know what is the current status of detect_peaks after all this year refactoring. I might have to work on peak detection later this summer so I can take a look and come back to this. If not, and no one wants to work on this then we can close it.