BarbourLab / lussac

Package for automatic post-processing and merging of multiple spike-sorting analyses.
GNU Affero General Public License v3.0
25 stars 3 forks source link

waveform extraction has unexpected behaviour #7

Closed PierreLeCabec closed 1 year ago

PierreLeCabec commented 1 year ago

Hi,

I am attempting to run Lussac on multiple sorter outputs that I have previously computed using SpikeInterface. To do this, I am using the following script:


# -*- coding: utf-8 -*-
"""
Created on Tue Oct  3 15:31:21 2023

@author: Pierre.LE-CABEC
"""
import spikeinterface.sorters as ss
import spikeinterface as si  # import core only
from lussac.core import LussacData, LussacPipeline

mouse = 173
delay = 'Fixed_delay'

params = { "lussac": {
                    "logs_folder": fr"C:\local_data\Paper\Data\spike\{mouse}_{delay}\Lussac/logs",
                    "tmp_folder": fr"C:\local_data\Paper\Data\spike\{mouse}_{delay}\Lussac/tmp",
                    "si_global_job_kwargs": {
                                            "n_jobs": 6, 
                                            "chunk_duration": "2s",
                                            "progress_bar": False,
                                            "verbose": False
                                            },
                    "overwrite_logs": False,
                    "pipeline": {
                                "units_categorization": {
                                                      "all": {  # Categorize all units.
                                                              "spikes": {
                                                                      "firing_rate": {
                                                                              "min": 0.4,
                                                                              "max": 200.0
                                                                      },
                                                                      "contamination": { 
                                                                              "refractory_period": [0.3, 1.0],
                                                                              "max": 0.3
                                                                      },
                                                                      "SNR": {
                                                                              "peak_sign": "neg", 
                                                                              "min": 2.5
                                                                      },
                                                                      'wvf_extraction': {
                                                                        'ms_before': 2.0,
                                                                        'ms_after': 2.0,
                                                                        'max_spikes_per_unit': 1_000
                                                                    },
                                                              }
                                                      }
                                              },
                                    "align_units": {
                                                  "all": {
                                                    'wvf_extraction': {
                                                        # 'ms_before': 2.0,
                                                        # 'ms_after': 2.0,
                                                        # 'max_spikes_per_unit': 1_000,
                                                        # 'folder': 
                                                    },
                                                    'filter': [200, 5000],
                                                    'threshold': 0.5,
                                                    'check_next': 10
                                                }
                                          },
                                    "remove_bad_units": {

                                                      "all": {  # Remove units with firing rate < 1.0 Hz or amplitude std > 80 µV
                                                              "firing_rate": {
                                                                      "min": 1.0
                                                              },
                                                              "amplitude_std": {
                                                                      "max": 80.0
                                                              }
                                                      },
                                                  },
                                  }
                    },
    }

recording = si.BinaryFolderRecording(fr'C:\local_data\Paper\Data\concaneted_recording\{mouse}_{delay}\concatenated_recording_int16')
recording.annotate(is_filtered=True)

sortings = {}    
for sorter_name in [
                'kilosort2_5',
                'mountainsort4',
                'tridesclous',
                'spykingcircus',
                ]:
    print(sorter_name)
    current_sorter = ss.NpzSortingExtractor.load_from_folder(fr'C:\local_data\Paper\Data\spike\{mouse}_{delay}\{sorter_name}\sorter\in_container_sorting')
    current_sorter._annotations['name'] = sorter_name
    sortings[sorter_name] = current_sorter

lussac_data = LussacData(recording, sortings, params)
lussac_pipeline = LussacPipeline(lussac_data)
lussac_pipeline.launch()

The problem I am having varies depending on which module I try to run (units_categorization, align_units, or remove_bad_units). I either receive error messages such as:

  File ~\AppData\Local\anaconda3\envs\lussac\Lib\site-packages\spikeinterface\core\waveform_extractor.py:382 in nbefore
    nbefore = int(self._params["ms_before"] * self.sampling_frequency / 1000.0)

KeyError: 'ms_before' 

or

  File ~\AppData\Local\anaconda3\envs\lussac\Lib\site-packages\spikeinterface\core\waveform_extractor.py:396 in return_scaled
    return self._params["return_scaled"]

KeyError: 'return_scaled'

or unending loading time were the kernel seem to be frozen.

I track all those issue to unexpected result from the si.extract_waveforms function (the frozen behaviour of the kernel happen inside this function and all the KeyError come from the _params dict of the waveform_extractor). This is surprising because when I manual load the waveform_extractor using the provided recording and sorter I don't have those issue: the following script:

recording = si.BinaryFolderRecording(fr'C:\local_data\Paper\Data\concaneted_recording\{mouse}_{delay}\concatenated_recording_int16')
recording.annotate(is_filtered=True)

sortings = {}    
for sorter_name in [
                'kilosort2_5',
                'mountainsort4',
                'tridesclous',
                'spykingcircus',
                ]:
    print(sorter_name)
    current_sorter = ss.NpzSortingExtractor.load_from_folder(fr'C:\local_data\Paper\Data\spike\{mouse}_{delay}\{sorter_name}\sorter\in_container_sorting')
    current_sorter._annotations['name'] = sorter_name
    sortings[sorter_name] = current_sorter
    we = si.extract_waveforms(recording, current_sorter, mode='memory')
    print(we.return_scaled, we._params['ms_before'], '\n')

retrun: kilosort2_5 Setting 'return_scaled' to False extract waveforms shared_memory: 100%|##########| 2691/2691 [00:00<00:00, 7081.59it/s] False 3.0

mountainsort4 Setting 'return_scaled' to False extract waveforms shared_memory: 100%|##########| 2691/2691 [00:00<00:00, 13126.83it/s] False 3.0

tridesclous Setting 'return_scaled' to False extract waveforms shared_memory: 100%|##########| 2691/2691 [00:00<00:00, 13659.91it/s] False 3.0

spykingcircus Setting 'return_scaled' to False extract waveforms shared_memory: 100%|##########| 2691/2691 [00:00<00:00, 10371.85it/s] False 3.0

DradeAW commented 1 year ago

This is interesting, the code seems correct and I have never seen these errors before (and they do not show up in the Linux or Windows unit tests).

What version of SpikeInterface are you using? Do you have a more complete error message? (to see what line in Lussac triggers this)

SpikeInterface recently changed the default behaviour or extract_waveforms, but it shouldn't have this kind of effects ...

PierreLeCabec commented 1 year ago

I am using SpikeInterface 0.98.2. here a full example: runfile('C:/Pierre.LE-CABEC/Code Pierre/paperfede/core/data_extraction/spikes/test_postprocessing.py', wdir='C:/Pierre.LE-CABEC/Code Pierre/paperfede/core/data_extraction/spikes') C:\Users\pierre.LE-CABEC\AppData\Local\anaconda3\envs\lussac\Lib\site-packages\tqdm\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm kilosort2_5 mountainsort4 tridesclous spykingcircus

Running Lussac!


** units_categorization **


Running category 'all':

KeyError: 'return_scaled'

DradeAW commented 1 year ago

Ah yes I see where the problem comes from!

In some cases (such as computing the SNR), the parameters for the wvf_extraction had to be given inside the dictionary (meaning that your wvf_extraction needs to be a child of SNR). But I think there should also be default values so that the user doesn't have to specify a mountain of parameters for something as simple as the SNR (but can if he wants to).

I'll make a PR soon to fix that default behaviour (on the dev branch), and also the documentation as right now, it is misleading and wrong!! In the mean time, you can change the parameters and it should work :)

Thanks for finding this!

PierreLeCabec commented 1 year ago

Thanks! :) The key error issue seem to be fixed. I think I located the problem I had of unending computing/ frozen behaviour when trying to extract waveform through Lussac. It seem to be caused by parallel processing, has I don't have any issue when running without those parameters: "si_global_job_kwargs": { "n_jobs": 32,
"chunk_duration": "2s", "progress_bar": True, "verbose": True },

I modified the code so that an error is raise right before running the line that froze my script. Here is the complete error message: kilosort2_5 mountainsort4 tridesclous spykingcircus

Running Lussac!


C:\Users\pierre.LE-CABEC\spikeinterface\src\spikeinterface\core\base.py:1011: UserWarning:

Versions are not the same. This might lead to compatibility errors. Using spikeinterface==0.98.2 is recommended

** units_categorization **


Running category 'all':

ValueError: Frozen behaviour happen when running the next line: "processor.run()"

DradeAW commented 1 year ago

Glad to hear one problem is down!

The other problem looks like a SpikeInterface error rather than Lussac. What is the CPU you are using on your machine, and how much RAM do you have? Also, where is your data located? (meaning on an HDD or SSD?)

In ~\lussac\src\lussac\core\module.py:186, you should have return si.extract_waveforms(recording, sorting, folder_path, allow_unfiltered=True, **params) To make sure it comes from SpikeInterface, you can print params and try to run the waveform extraction (to a folder) with the same parameters and n_jobs=32. The idea being seeing if you observe the same freeze.

PierreLeCabec commented 1 year ago

My CPU: 13th Gen Intel(R) Core(TM) i9-13900K 3.00 GHz Ram: 64.0 Go (63.7 Go utilisable) Data stored locally on a 1T SSD

This is params given to extract_waveforms inside of lussac: {'ms_before': 1.0, 'ms_after': 1.0, 'max_spikes_per_unit': 500, 'n_jobs': 32, 'chunk_duration': '2s', 'progress_bar': False, 'mp_context': None, 'max_threads_per_process': 1, 'verbose': False}

When I run extract_waveforms from Spikeinterface outside of Lussac I also freeze so it does seem to come from Spikeinterface.

DradeAW commented 1 year ago

I suggest setting a number of jobs that is a bit lower than the total (leave at least 2-4 for the system), but I don't think this is the root of the problem. Your CPU, RAM and disk should allow for this to be ran quite easily in theory.

I would suggest opening an issue on the SpikeInterface GitHub.

In the meantime, you can try to set sparse=False to see if that changes anything? Or setting the number of jobs to 1.

PierreLeCabec commented 1 year ago

It is not really a issue for me as it run happily without putting "si_global_job_kwargs" in the params dict of Lussac.