SpikeInterface / spikeinterface

A Python-based module for creating flexible and robust spike sorting pipelines.
https://spikeinterface.readthedocs.io
MIT License
533 stars 187 forks source link

SpykingCircus2 consuming lots of memory #2202

Closed rory-bedford closed 12 months ago

rory-bedford commented 1 year ago

Hi there,

I'm trying to get SpykingCircus2 to run on a 30 minute tetrode recording. My problem is that it seems to use an awful lot of memory, so it keeps crashing out.

I'm using spikeinterface 0.99.1, hdbscan 0.8.33, and numba 0.58.1, with python 3.8.18.

Here's my recording information:

BinaryFolderRecording: 4 channels - 48.0kHz - 1 segments - 87,078,300 samples 1,814.13s (30.24 minutes) - int16 dtype - 664.35 MiB

I'm running on a CPU cluster with 4 cores, and am updating params['job_kwargs']['n_jobs'] to reflect this. Otherwise I'm using the default parameters.

Here's my resource usage:

State: OUT_OF_MEMORY (exit code 0) Nodes: 1 Cores per node: 4 CPU Utilized: 00:05:13 CPU Efficiency: 32.20% of 00:16:12 core-walltime Job Wall-clock time: 00:04:03 Memory Utilized: 79.42 GB Memory Efficiency: 99.27% of 80.00 GB

Compare this with MountainSort5, run on exactly the same recording with default parameters:

State: COMPLETED (exit code 0) Nodes: 1 Cores per node: 4 CPU Utilized: 00:00:43 CPU Efficiency: 41.35% of 00:01:44 core-walltime Job Wall-clock time: 00:00:26 Memory Utilized: 1.19 MB Memory Efficiency: 0.00% of 30.00 GB

And you see the memory requirements are completely different.

Does anyone know why this is and what I can do to fix this? We'd really like to get SpykingCircus2 to work.

On spikeinterface 0.98.2 the problem was even worse - it kept requesting memory to allocate an array which was 26 PiB! (I had to look up those units!).

Any help is much appreciated!

yger commented 1 year ago

Sound like a clear issue here, I can run it on my laptop for neuropixel recording without any problems. I'll try to understand, but maybe you can let me know how did you monitored the memory usage, and if you can share the recording with me. What is the OS?

rory-bedford commented 1 year ago

Thanks for the quick response! I'm running jobs on our compute cluster with slurm, so using seff to monitor resource usage. OS is Alma Linux 8.8.

I'll have to send you the recording tomorrow though I'm afraid!

yger commented 1 year ago

This is really strange, because as said the memory consumption is completely normal here (and should be). Since this is a small recording here, there should be a massive leak somewhere. I would ba also happy to have your complete script to understand how the recording is loaded

rory-bedford commented 1 year ago

Here's the recording:

https://cloud.mrc-lmb.cam.ac.uk/s/eLnSL7AHtmcBNT8

Here's my script, which I think is fairly standard but let me know if I'm doing anything silly!

  import spikeinterface as si
  import spikeinterface.sorters as ss

  # load default parameters
  params = ss.get_default_sorter_params('spykingcircus2')

  # set global si kwargs
  si.set_global_job_kwargs(n_jobs=4)

  # load recording
  rec = si.load_extractor('/cephfs/rbedford/test_spykingcircus2/0')

  # set spykingcircus2 job parameters
  params['job_kwargs'] = {'n_jobs':4}

  # run spikesorting
  sorting = ss.run_sorter('spykingcircus2',
                    rec,
                    output_folder='/cephfs/rbedford/test_spykingcircus2/output',
                    **params,
                    verbose=True
  )

# save sorting
sorting.save(folder='/cephfs/rbedford/test_spykingcircus2/output_cached')

I'm not applying any preprocessing as the 'apply_preprocessing' parameter is true by default for SpykingCircus2, but let me know if this is a mistake.

yger commented 1 year ago

Can you double check what is the version of spikeinterface that is effectively used by your python? Because I've tested on my machine, and everything worked smoothly. I remember that such a very large consumption of RAM could happen in older version of SI, but this has been fixed a while ago so I am surprised. When exactly is the memory usage getting crazy?

rory-bedford commented 1 year ago

I double checked and reinstalled my environment (definitely on 0.99.1). I've increased the allocated memory and run it for an hour. The steps seem to work fine up until 'find spikes (circus-omp-svd)', when it starts consuming about 35GB of RAM. It's also pretty slow - it only got 25% of the way through the find spikes step before timing out after 1 hour. Can I ask how long it took you to run on this half hour tetrode recording? 4 hours seems wrong to me. Also, it's only running at about 1% CPU efficiency.

Looking at the logs, despite me manually passing in the n_jobs parameter, it's still trying to use 112 CPUs, as you can see here:

Output:

detect peaks using locally_exclusive with n_jobs = 112 and chunk_size = 48000
We found 284678 peaks in total
We kept 20000 peaks for clustering
extracting features with n_jobs = 112 and chunk_size = 48000
We found 12 raw clusters, starting to clean with matching...
extract waveforms shared_memory multi buffer with n_jobs = 112 and chunk_size = 48000
extract waveforms memmap multi buffer with n_jobs = 112 and chunk_size = 48000
We kept 12 non-duplicated clusters...
extract waveforms shared_memory multi buffer with n_jobs = 112 and chunk_size = 48000
extract waveforms memmap multi buffer with n_jobs = 112 and chunk_size = 48000

Error:

detect peaks using locally_exclusive: 100%|██████████| 1815/1815 [00:04<00:00, 407.86it/s]
extracting features: 100%|██████████| 1815/1815 [00:02<00:00, 767.74it/s] 
extract waveforms shared_memory multi buffer: 100%|██████████| 1815/1815 [00:00<00:00, 2257.49it/s]
extract waveforms memmap multi buffer: 100%|██████████| 1815/1815 [00:00<00:00, 2781.21it/s]
extract waveforms shared_memory multi buffer: 100%|██████████| 1815/1815 [00:00<00:00, 2917.20it/s]
extract waveforms memmap multi buffer: 100%|██████████| 1815/1815 [00:00<00:00, 2861.53it/s]
find spikes (circus-omp-svd):  25%|██▌       | 4543/18142 [57:24<1:52:27,  srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
slurmstepd: error: *** JOB 4374888 ON fmb369 CANCELLED AT 2023-11-15T15:31:08 DUE TO TIME LIMIT ***
find spikes (circus-omp-svd):  25%|██▌       | 4546/18142 [59:00<2:12:54,  1.70it/s]slurmstepd: error: *** STEP 4374888.0 ON fmb369 CANCELLED AT 2023-11-15T15:31:08 DUE TO TIME LIMIT ***

So the n_jobs parameter somehow isn't getting passed to SpykingCircus2. I think this could be the source of all the issues - it's going to be context switching an awful lot which could explain the low CPU efficiency, and would only amount to about 0.3GB RAM per thread which is more in line with what I would expect.

Based on the script above am I somehow passing the n_jobs parameter incorrectly, or could this be a bug in my SI version? If I can remedy it I will test again with the correct number of jobs and get back to you.

Thanks again for the help!

Edit: Just to clarify: 112 is the number of cores on the cluster node I'm running on. With n_jobs=-1 it will see all 112 cores, but will be forced by SLURM to only use the 4 I've been allocated, hence the need to manually specify the number of cores.

yger commented 1 year ago

Actually, I was able to reproduce the error, the procedure to fit spikes is indeed taking ages here also in one other env, so I'll try to understand why. Regarding the way to pass jobs to SC2, you should set the argument job_kwargs = {"n_jobs": XX} in the run_sorter command line. NEver check if this work with the global definition of n_jobs, I will asap

yger commented 1 year ago

This has been fixed in #2205, that will be merged hopefully soon. Note however that your data are strange, with many silences and/or weird shapes

rory-bedford commented 1 year ago

Great, thanks for the fix! Will test once it's merged. I didn't collect the data myself but will pass that on, thanks.

yger commented 1 year ago

This should be fixed in main. Let me know such that we can close this issue