SpikeInterface / spikeinterface

A Python-based module for creating flexible and robust spike sorting pipelines.
https://spikeinterface.readthedocs.io
MIT License
527 stars 187 forks source link

run analyzer dependencies automatically #3246

Closed rat-h closed 3 months ago

rat-h commented 3 months ago

after the update to the new stable version, it became impossible to predict whether SorterAnalyser (aka WaveFormExtractor) will crash due to module dependencies or not. Moreover, I couldn't find any way to figure out how to find missing dependencies in the new system

is it possible to auto-run all required modules if they are missed instead of returning an error?

If not can you provide a code example of how to find out if there are any missing dependencies and run them before running a module?

zm711 commented 3 months ago

By dependencies you mean actual package dependencies? They should be covered by doing

pip install spikeinterface[full]

Are we missing something? Or do you mean something different?

zm711 commented 3 months ago

Or if you mean the analyzer dependency graph it is here:

https://spikeinterface.readthedocs.io/en/stable/modules/postprocessing.html

Basically I would run the core first

analyzer.compute(['random_spikes', 'waveforms', 'templates', 'noise_levels'])

Everything else should flow from that with a few other dependencies (for example you need PCA run in order to do PC-based qualitymetrics).

rat-h commented 3 months ago

@zm711 sorry I had to include this error message

   File "/home/spikeinterface/.local/lib/python3.10/site-packages/spikeinterface/core/sortinganalyzer.py", line 915, in compute
    return self.compute_one_extension(extension_name=input, save=save, verbose=verbose, **kwargs)
  File "/home/spikeinterface/.local/lib/python3.10/site-packages/spikeinterface/core/sortinganalyzer.py", line 991, in compute_one_extension
    assert ok, f"Extension {extension_name} requires {dependency_name} to be computed first"
AssertionError: Extension waveforms requires random_spikes to be computed first
rat-h commented 3 months ago

That means, for any extension, I must check the graph. That isn't useful.

Can it be done automatically? If something from this graph is missing - run it.

BTW. I even cannot implement such an "anticrash" code on my side. I couldn't find how to get a list of dependencies from an extension! At least something like this would be already helpfull

analyzer.get_dependencies(input='extension')
samuelgarcia commented 3 months ago

Hi. Did you have a look to this ?https://spikeinterface.readthedocs.io/en/stable/tutorials/waveform_extractor_to_sorting_analyzer.html#from-waveformextractor-to-sortinganalyzer

sortinganalyzer is pretty similar to waveformextractor but the steps 'random_spikes' and 'waveforms' used to be hidden and now must be explicit. Not that a big benefit with the new design is that 'waveforms' can be skip to compute templaes.

rat-h commented 3 months ago

Yes, I did! But before you could run WaveExtractor and it ran random spikes implicitly. Now it returns an error and all my scripts are dead :exploding_head:

Can an extension check if there are missing dependencies run them. The graph of dependencies doesn't have loops, so it is safe to run dependencies recursively.

zm711 commented 3 months ago

Each extension has a depend_on attribute. So you use the get_extension_class function would return the class and then you could check the dependencies yourself. I think this is mostly a learning curve situation.

from spikeinterface.core.sorting_analyzer import get_extension_class
Amplitude = get_extension_class('spike_amplitudes')
amplitude_dependencies = Amplitude.depend_on

But really for me you just start with what I typed:

analyzer.compute(['random_spikes', 'waveforms', 'templates', 'noise_levels']) 
# or as Sam said no waveforms
analyzer.compute(['random_spikes', 'templates', 'noise_levels'])

From there you can do all other metrics, except PC-based quality metrics which will require you to run

analyzer.compute('principal_components')

That's basically it. You can adapt scripts to do get those values in the future. Does that make sense? EDIT: to show where the get_extension_class function is from

zm711 commented 3 months ago

We also have a has_extension function, but that is only useful if you already have the dependencies memorized (like unfortunately all the developers do). So that's our preferred function because we know what to check for...

rat-h commented 3 months ago

what if I have a system where users select what they need? Why do not run dependencies implicitly?

### insed extension cass
for ext in self.dependencies:
    if not ext in self.analizer:
       self.analizer.compute(ext)
zm711 commented 3 months ago

Because every extension has parameters. We can run the default, but we want the user to explicitly choose to run the default with each extension. So by running compute for each extension we are having the user explicitly agree to use the defaults. I agree the implicit hidden is nice for people who use and trust the defaults, but explicit is more customizable and puts the burden on the end-user to have some say in what they are doing (even if it is just agreeing with the developer). I think @samuelgarcia has strong opinions about this design choice so he can also explain this design choice.

zm711 commented 3 months ago

what if I have a system where users select what they need?

And for this you can basically recreate the WaveformExtractor with the one line of code I included above. Then the end user can pick anything off of the extension list and you'll have already computed what is necessary for those extensions. for example,

analyzer.compute(['random_spikes', 'waveforms', 'templates', 'noise_levels'])

if user_input =='amplitudes':
    _  = analyzer.compute('spike_amplitudes')
elif user_input == 'spike_locations'
    _ = analyzer.compute('spike_locations')
etc
rat-h commented 3 months ago

Honestly, this change created a horrible headache here. There are scripts built upon spikeinterface, and how they are all broken. I agree, the new system is much nicer and cleaner, but it shouldn't break half a year of my work and tones of scripts and optimizations which were made. Interface in spikeinterface should be stable, spikes aren't.

rat-h commented 3 months ago

So you use the get_extension_class function would return the class and then you could check the dependencies yourself. I think this is mostly a learning curve situation.

Not in current version

version 0.101.0
Traceback (most recent call last):
  File "/home/spikeinterface/spikeinterface/runthepipe3.py", line 717, in <module>
    runanalyzer(last,preproc_saved,sorting_saved)
  File "/home/spikeinterface/spikeinterface/runthepipe3.py", line 399, in runanalyzer
    ext = si.get_extension_class(mm)
AttributeError: module 'spikeinterface.full' has no attribute 'get_extension_class'

Nothing in the documentation too

zm711 commented 3 months ago

Sorry for that!

I think the thought process was that moving forward people would analyze all data with the Analyzer and then load old data with the MockWaveformExtractor, but it sounds like you're really developing off of spikeinterface vs just analyzing with spikeinterface. That does make things tricky (we are still working on making the library work with Numpy 2.0 which were improvements, but definitely requires a lot of scrambling (headaches) on our part. If your scripts really are so locked into the WaveformExtractor you could pin to 0.100.8.

Not in current version

Sorry I wasn't clear.

from spikeinterface.core.sorting_analyzer import get_extension_class
Waveforms = get_extension_class('waveforms')

I edited above so it would be clear for people that might read this in the future.

EDIT: I'm wrong again I looked in source file.

rat-h commented 3 months ago

Sorry for that!

Thank you. I'm trying to build an optimization pipeline to find sorting parameters that are performing best on specific recordings.

optimization-pipline

But now all my postprocessing broken. Well, I just hope the library interface won't change in the future.

zm711 commented 3 months ago

Well, I just hope the library interface won't change in the future.

I'm pretty bought into stability here too. I'm definitely on the side of no more major changes :)

samuelgarcia commented 3 months ago

I am also really sorry for you. Normally we put some effort to get old codes still work (but with warnings).

we = si.extract_waveforms() should still works by creating a hidden MockWaveformExtractor and so indirectly a SortingAnalyzer

In short this king of legacy code should not be broken in 0.101.0 but only deprecated , so users could switch explicitly to SortingAnalyzer when they want and not just after update:


# we is a MockWaveformExtractor that behave as before
we = extract_waveforms(recording, sorting, folder='/XXX')
compute_principal_components(we)
compute_spike_amplitudes(we)
compute_quality_metrics(we)

if you need we can improve the MockWaveformExtractor

samuelgarcia commented 3 months ago

If it helps, here you can see that the extract_waveforms() backward compatibility is quite straigthforward https://github.com/SpikeInterface/spikeinterface/blob/main/src/spikeinterface/core/waveforms_extractor_backwards_compatibility.py#L37