SpikeInterface / spikeinterface

A Python-based module for creating flexible and robust spike sorting pipelines.
https://spikeinterface.readthedocs.io
MIT License
533 stars 187 forks source link

ValueError: The sorting object has spikes exceeding the recording duration. #2573

Closed taningh86 closed 8 months ago

taningh86 commented 8 months ago

Hi, This error below is new and never happened before while trying to extract waveforms from sorted data. Below is the code: ``sorting_KS25 = ss.read_sorter_folder(base_folder/"results_KS25") sorting_KS25 recording_saved = si.load_extractor(base_folder / "preprocessed") sorting = sorting_KS25 print(sorting)

we = si.extract_waveforms(recording_saved, sorting, folder=base_folder / "waveforms", allow_unfiltered=True, load_if_exists=False, overwrite=True, **job_kwargs) #Without allow_unfiltered=True it was giving error saying that waveforms are not filtered. This is because I did not run band pass filter earlier due to catGT output files already being bandpass filtered print(we)``

and I get the following error ; ValueError: The sorting object has spikes exceeding the recording duration. You have to remove those spikes with the `spikeinterface.curation.remove_excess_spikes()` function

Not sure why this is. Also, how can no of spikes exceed recording duration. Not sure if I am not understanding it right but from the face value of it I am not sure how it is possible to have "excessive spikes" in a recording duration. First, I believe there's no limit set on the source code to how many spikes are allowed in a recording duration and second, spike counts and duration are two different units that cannot be compared. Unless the error means that somehow after sorting the total duration of recording is coming out longer than the actual duration of the raw data? Which, again cannot possibly happen, unless this is a bug. Additionally, I searched for "remove_excess_spikes()" function in the documentation and was not successful in finding it. Will very much appreciate a solution to this. This did not happen with recordings of same length or longer from the same probe.

Thanks Jimmy

zm711 commented 8 months ago

Howdy @taningh86, this is a bug we see with Kilosort sometimes. it is not saying you have too many spikes it is saying that you have spikes after the recording time has ended. For example if your recording is 60 seconds kilosort for some reason is returning a spike time of 61 seconds. There really doesn't seem to be a pattern for why kilosort does this. (Out of all my recordings it has only happened to me once). The function is here.

But you can do:

import spikeinterface.curation as scur

sorting_wout_excess_spikes = scur.remove_excess_spikes(sorting_KS25, recording_saved)
taningh86 commented 8 months ago

Hi zm711. Thanks for the quick response. Very much appreciate it. I am glad spikeinterface has a remedy for it. By the way, is there a reason why I was unable to find remove_excess_spikes() in the SI documentation?

alejoe91 commented 8 months ago

We might have forgotten to add it! Thanks for noticing, I'll take a look at the docs!

h-mayorquin commented 8 months ago

I do believe that this:

and I get the following error ; ValueError: The sorting object has spikes exceeding the recording duration. You have to remove those spikes with the spikeinterface.curation.remove_excess_spikes() function

Not sure why this is. Also, how can no of spikes exceed recording duration. Not sure if I am not understanding it right but from the face value of it I am not sure how it is possible to have "excessive spikes" in a recording duration. First, I believe there's no limit set on the source code to how many spikes are allowed in a recording duration and second, spike counts and duration are two different units that cannot be compared. Unless the error means that somehow after sorting the total duration of recording is coming out longer than the actual duration of the raw data? Which, again cannot possibly happen, unless this is a bug.

Indicates that the error message is very ambigious and unclear. The problem comes from the fact that the word spikes is overloaded to mean both the physical object, the index of a vector and the times of spike events. How could we improve this?

@zm711 @taningh86 What do you think of?

ValueError: The sorting object has spikes whose times go beyond the recording duration. This could indicate a bug.  To remove those spikes, you can use `spikeinterface.curation.remove_excess_spikes()` function

Would this be clearer?

zm711 commented 8 months ago

This could indicate a bug in the sorter maybe just to add more clarity. Otherwise people will come to the issue tracker and report the "bug" as a spikeinterface bug rather than as a sorter issue with the recording and sorting objects.

But yeah I think you're right that we are overloading terms. So your statement is clearer in that it is not the "spikes" which are the problems, but the "times".

h-mayorquin commented 8 months ago

All right. Let's see what OP and @alejoe91 say.

alejoe91 commented 8 months ago

Agree, making a PR!

alejoe91 commented 8 months ago

Actually, the function is in the docs: https://spikeinterface.readthedocs.io/en/latest/api.html#spikeinterface.curation.remove_excess_spikes

I was able to find it from the search tab. @taningh86 not sure why you couldn't find it!

alejoe91 commented 8 months ago

Apparently Sam removed the check in the waveforms extension. @samuelgarcia a specific reason for that? Maybe this should be done at the SortingAnalyzer instantiation by default. What do you guys think?

zm711 commented 8 months ago

I'm not opposed, but I guess if we deal with it under the hood then we will never track down the root cause (not that we've really searched for the cause). I've never actually checked how far out the excess spikes are... for example if they are just one frame after the recording duration then maybe this is a very slight rounding error. Since this basically never seems to come up for my datasets I haven't really felt the need to explore. I just ran the function and didn't worry about it.

taningh86 commented 8 months ago

All right. Let's see what OP and @alejoe91 say.

Hi @h-mayorquin I agree with Zach. Adding that 'its is likely a problem of the sorter' would put more clarity. Thanks!

samuelgarcia commented 8 months ago

Yes I can add it back in teh create method. But we need to refactor the RemoveExcessSpikesSorting to use the spike vector approach to be faster.

h-mayorquin commented 8 months ago

I opened a PR that improves the documentation: https://github.com/SpikeInterface/spikeinterface/pull/2604

I think we should close this after we merge that PR as the other points are related but independent.

My take on the other discussed isssues.

Two questions: 1) Do you want to do the check at __init__? Do you want to cal remove_excess_spikes by default? One condition on the other? 2) Does this matters for all the extensions?

If the answer to the second is no, then I kind of feel that we don't want to over-burden the super-duper-general SortingAnalyzer with checks and complexity. Can we do it only for the extensions where it matters?

The opposite approach would be to raise that error when the error actually happens but that goes against the principle of failing at initialization which I think is a good idea for the case of long computations where the error will be at the end.

samuelgarcia commented 8 months ago

Hi Ramon,

  1. the check must not be done in init but in SortingAnalyzer.create()
  2. I prefer to apply the ckipping and warn the user but remove_excess_spikes() is lazy and do not use spike_vector. In SortingAnlayzer the representtaion is force to numpysharedmemmory (so spike vector) so the cut should be very very fast maybe with the remove_excess_spikes()
  3. some extension are not sensible to this

THanks for checking this

h-mayorquin commented 8 months ago

Makes sense to me., @samuelgarcia. Thanks for clarifying. I think it never clicked on me the purpose (or at least one off) of the create method on the WaveformExtrractor and the SortingAnalyzer until now.

This is of topic but I was wondering if it would make to cache the last frames of the sorting objects. This would make the check trivially easier.

h-mayorquin commented 8 months ago

Should we close this?

taningh86 commented 8 months ago

@h-mayorquin from my end I am good with closing it. Thanks!