SpikeInterface / spikeinterface

A Python-based module for creating flexible and robust spike sorting pipelines.
https://spikeinterface.readthedocs.io
MIT License
531 stars 188 forks source link

Only load `template_metrics` extension on compute if keeping some metrics #3478

Open chrishalcrow opened 1 month ago

chrishalcrow commented 1 month ago

Might fix #3471 (Could you try this out please @jonpedros)

When template_metrics is computed and delete_existing_metrics = False, any old metrics which aren't being recomputed are kept. To do this, the _run method loads the old template metric extension (if it exists). However, if this is the first time it has run, it has already created the extension folder. In this case _run sees the newly-created folder and tries to load it. Bu there's not much in it, so a no run_info warning appears.

To avoid this, this PR only loads the extension folder if there are metrics to be kept.

I think the previous implementation of load_run_info was sending two warnings if a run_info file didn't exist. I think I've made the logic a bit simpler. Could one of the run_info experts take a look please (@jonahpearl @alejoe91 ) - thanks!

jonahpearl commented 1 month ago

The run_info change looks fine. Incidentally, this is exactly the reason that I think it would be useful to save run_info before the run has occurred, and perhaps as soon as the extension is created (@alejoe91 and see #3451) — because then it's very simple to check if the extension has already been run. If it hasn't been, you can skip all the delete_existing_metrics logic, avoid loading an empty folder, etc.

As written here, I think it's this condition if set(self.params["metrics_to_compute"]) != set(self.params["metric_names"]) that's being used to check if the newly created folder is empty? I suspect it would be clearer to just check if not run_info["run_completed"] (if we can make run_info get saved before the run happens) but lmk if I'm misunderstanding.

chrishalcrow commented 1 month ago

Making the run_info.json file asap sounds reasonable to me.

As written here, I think it's this condition if set(self.params["metrics_to_compute"]) != set(self.params["metric_names"]) that's being used to check if the newly created folder is empty? I suspect it would be clearer to just check if not run_info["run_completed"] (if we can make run_info get saved before the run happens) but lmk if I'm misunderstanding.

To be honest, it's quite a good check to do anyway: it will now only try to load the template_metrics extension it needs propagate some already-existing metrics from an old run. So in this case I'm happy with the solution.

alejoe91 commented 1 month ago

Making the run_info.json file asap sounds reasonable to me.

As written here, I think it's this condition if set(self.params["metrics_to_compute"]) != set(self.params["metric_names"]) that's being used to check if the newly created folder is empty? I suspect it would be clearer to just check if not run_info["run_completed"] (if we can make run_info get saved before the run happens) but lmk if I'm misunderstanding.

To be honest, it's quite a good check to do anyway: it will now only try to load the template_metrics extension it needs propagate some already-existing metrics from an old run. So in this case I'm happy with the solution.

I think I removed it because it didn't play nice with loading old waveform extractor folders as an analyzer. This was part as a larger fix to improve back-compatibility, so I'm happy to test if this PR works on old data on my end!

jonpedros commented 4 weeks ago

Tested it and now all the extensions I need to compute are outputted as expected. I do still get the warnings, though, but I guess that was not changed?

For template_metrics:

[c:\Users\systemses\Anaconda3\envs\si_env\Lib\site-packages\spikeinterface\core\sortinganalyzer.py:2043](file:///C:/Users/systemses/Anaconda3/envs/si_env/Lib/site-packages/spikeinterface/core/sortinganalyzer.py:2043): UserWarning: Found no run_info file for template_metrics, extension should be re-computed.
  warnings.warn(f"Found no run_info file for {self.extension_name}, extension should be re-computed.")
[c:\Users\systemses\Anaconda3\envs\si_env\Lib\site-packages\spikeinterface\core\sortinganalyzer.py:2050](file:///C:/Users/systemses/Anaconda3/envs/si_env/Lib/site-packages/spikeinterface/core/sortinganalyzer.py:2050): UserWarning: Found no run_info file for template_metrics, extension should be re-computed.
  warnings.warn(f"Found no run_info file for {self.extension_name}, extension should be re-computed.")
[c:\Users\systemses\Anaconda3\envs\si_env\Lib\site-packages\spikeinterface\core\sortinganalyzer.py:2125](file:///C:/Users/systemses/Anaconda3/envs/si_env/Lib/site-packages/spikeinterface/core/sortinganalyzer.py:2125): UserWarning: Found no data for template_metrics, extension should be re-computed.
  warnings.warn(f"Found no data for {self.extension_name}, extension should be re-computed.")

For spike_amplitudes:

[c:\Users\systemses\Anaconda3\envs\si_env\Lib\site-packages\numpy\core\_methods.py:206](file:///C:/Users/systemses/Anaconda3/envs/si_env/Lib/site-packages/numpy/core/_methods.py:206): RuntimeWarning: Degrees of freedom <= 0 for slice
  ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
[c:\Users\systemses\Anaconda3\envs\si_env\Lib\site-packages\numpy\core\_methods.py:163](file:///C:/Users/systemses/Anaconda3/envs/si_env/Lib/site-packages/numpy/core/_methods.py:163): RuntimeWarning: invalid value encountered in divide
  arrmean = um.true_divide(arrmean, div, out=arrmean,
[c:\Users\systemses\Anaconda3\envs\si_env\Lib\site-packages\numpy\core\_methods.py:198](file:///C:/Users/systemses/Anaconda3/envs/si_env/Lib/site-packages/numpy/core/_methods.py:198): RuntimeWarning: invalid value encountered in divide
  ret = ret.dtype.type(ret / rcount)
chrishalcrow commented 1 week ago

Tested it and now all the extensions I need to compute are outputted as expected. I do still get the warnings, though, but I guess that was not changed?

Hey ~@jonahpearl~ @jonpedros- sorry, I completely forgot about this PR. The warnings were updated. I think that, based on the fact that your warnings about template_metrics happen at lines 2043 and 2050, you're using the "main" branch rather than this PRs branch. If you have the github CLI, you can check out this branch by running gh pr checkout 3478 in your spikeinterface folder. If you had time to do that, re-run and confirm that you don't get two identical "Found no run_info file for template_metrics, extension should be re-computed" warning; that'd be great.

jonahpearl commented 1 week ago

Hey @jonahpearl

@jonpedros I think you meant :)

jonpedros commented 1 day ago

If you had time to do that, re-run and confirm that you don't get two identical "Found no run_info file for template_metrics, extension should be re-computed" warning; that'd be great.

Tested and it's gone! Only the one from spike_amplitudes remains. Thanks!