Open tabedzki opened 6 months ago
The Kilosort4 algorithm itself supports using the CPU itself if the GPU isn't available from what I've seen of the KS documentation, though one might ave to actively choose to use the CPU.
Are your datasets small enough that you could run it on the CPU? For full scale sorting CPU is not a great option so as was mentioned in the original issue for any real data you should use a computer/server with a nvidia card.
Based on Sam's comment in that chat, it definitely looks like you could try to run this locally. The cuda check only occurs in a container. But if you tried to run this locally instead (docker_image=False), we can see if this works.
I had a kernel crash yesterday when I didn't use the docker image from what I remember. I can test this tomorrow to see if it is still an issue.
If the kernel is crashing locally then it might be an issue in 1) Kilosort or 2) pytorch for macOS. Docker is definitely a check in our codebase so we would need to change that check for docker + pytorch to work for macOS.
@zm711
I tried it again without the docker image and it crashed. I'm happy to share with you the notebook. Here's the log dump
14:47:55.767 [info] Cell 101 completed in 0.012s (start: 1714070875754, end: 1714070875766)
14:48:00.815 [info] Handle Execution of Cells 101 for ~/code/spiketutorials/Official_Tutorial_SI_0.99_Nov23/SpikeInterface_Tutorial.ipynb
14:48:00.956 [error] Disposing session as kernel process died ExitCode: undefined, Reason: OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
14:48:00.969 [info] Cell 101 completed in -1714070880.831s (start: 1714070880831, end: undefined)
We might need to see the script/notebook. This should use cpu
in this case. Which version of spikeinterface are you using?
did you explicitly set this just in case:
I did not explicitly set that. Where should I be set that/pass it in?
sorting = ss.run_sorter('kilosort4', recording, torch_device='cpu')
That way we ensure that the device is set to cpu for everything.
When I pass that argument in, I get Bad parameters: ['torch_device']
Here is the gist for the notebook that I ran (can't upload in this thread directly) https://gist.github.com/tabedzki/4352df1e71d12c2a4cc5df3c94888023
That feature was added to main but hadn't been backported to the 100 series yet. So that's why that parameter didn't work. Running locally should automatically trigger cpu. If you do
import torch
torch.cuda.is_available()
It prints False right?
Correct, it prints False
Last question would be does Kilosort4 work by itself without using the spikeinterface wrapper. Because if it also doesn't work then the issue would be over on their side.
Kilosort4 works without the spikeinterface. I ran the tutorial on the kilosort website and it correctly reports Using CPU for PyTorch computations.
from kilosort import run_kilosort
# NOTE: 'n_chan_bin' is a required setting, and should reflect the total number
# of channels in the binary file. For information on other available
# settings, see `kilosort.run_kilosort.default_settings`.
settings = {'data_dir': SAVE_PATH.parent, 'n_chan_bin': 385}
ops, st, clu, tF, Wall, similar_templates, is_ref, est_contam_rate = \
run_kilosort(settings=settings, probe_name='neuropixPhase3B1_kilosortChanMap.mat')
Cell output:
Interpreting binary file as default dtype='int16'. If data was saved in a different format, specify `data_dtype`.
Using CPU for PyTorch computations. Specify `device` to change this.
sorting /Users/tabedzki/ZFM-02370_mini.imec0.ap.bin
using probe neuropixPhase3B1_kilosortChanMap.mat
/opt/anaconda3/envs/dj3.9/lib/python3.9/site-packages/kilosort/io.py:497: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/utils/tensor_numpy.cpp:212.)
X[:, self.nt : self.nt+nsamp] = torch.from_numpy(data).to(self.device).float()
Preprocessing filters computed in 1.71s; total 1.71s
computing drift
Re-computing universal templates from data.
11%|█ | 5/45 [00:53<07:02, 10.55s/it]
I use a mac as my personal laptop I can try to troubleshoot this during the weekend.--for SI wrapper, I mean.
Sounds good, thanks!
As a related but different issue,
I have copied the spiketutorial notebooks over to one of the linux clusters and ran the same tutorial Nov23 and tried using the kilosort4 locally without the docker image. I received a new error, different from the previous one. I went back and tried the kilosort 4 tutorial that is available on their website and that ran without any trouble. I can upload a gist of the notebook. The only modifications I made (aside from installing the packages required, including mountainsort and kilosort) was to the folder path of the data and the type of sorter being used locally. Any help on this would be greatly appreciated. Please let me know what other information I should provide that would help.
# sorter_params = {'do_correction': False}
import json
# run spike sorting on entire recording
sorter_params = si.get_default_sorter_params('kilosort4')
print(sorter_params)
# Convert the data to a JSON formatted string with 4 spaces of indentation
json_str = json.dumps(sorter_params, indent=4)
# Print the pretty-printed JSON string
print(json_str)
sorting_KS4_no_docker = si.run_sorter(sorter_name='kilosort4', recording=recording_saved, remove_existing_folder=True,
output_folder=base_folder / 'results_KS4_no_docker',
verbose=True, **sorter_params, docker_image=False)
{'batch_size': 60000, 'nblocks': 1, 'Th_universal': 9, 'Th_learned': 8, 'do_CAR': True, 'invert_sign': False, 'nt': 61, 'artifact_threshold': None, 'nskip': 25, 'whitening_range': 32, 'binning_depth': 5, 'sig_interp': 20, 'nt0min': None, 'dmin': None, 'dminx': None, 'min_template_size': 10, 'template_sizes': 5, 'nearest_chans': 10, 'nearest_templates': 100, 'templates_from_data': True, 'n_templates': 6, 'n_pcs': 6, 'Th_single_ch': 6, 'acg_threshold': 0.2, 'ccg_threshold': 0.25, 'cluster_downsampling': 20, 'cluster_pcs': 64, 'duplicate_spike_bins': 15, 'do_correction': True, 'keep_good_only': False, 'save_extra_kwargs': False, 'skip_kilosort_preprocessing': False, 'scaleproc': None}
{
"batch_size": 60000,
"nblocks": 1,
"Th_universal": 9,
"Th_learned": 8,
"do_CAR": true,
"invert_sign": false,
"nt": 61,
"artifact_threshold": null,
"nskip": 25,
"whitening_range": 32,
"binning_depth": 5,
"sig_interp": 20,
"nt0min": null,
"dmin": null,
"dminx": null,
"min_template_size": 10,
"template_sizes": 5,
"nearest_chans": 10,
"nearest_templates": 100,
"templates_from_data": true,
"n_templates": 6,
"n_pcs": 6,
"Th_single_ch": 6,
"acg_threshold": 0.2,
"ccg_threshold": 0.25,
"cluster_downsampling": 20,
"cluster_pcs": 64,
"duplicate_spike_bins": 15,
"do_correction": true,
"keep_good_only": false,
"save_extra_kwargs": false,
"skip_kilosort_preprocessing": false,
"scaleproc": null
}
========================================
Loading recording with SpikeInterface...
number of samples: 9000000
number of channels: 49
numbef of segments: 1
sampling rate: 30000.0
dtype: int16
========================================
Preprocessing filters computed in 1.86s; total 1.86s
computing drift
Re-computing universal templates from data.
Error running kilosort4
The Error
{
"name": "SpikeSortingError",
"message": "Spike sorting error trace:
Traceback (most recent call last):
File \"/home/tabedzki/miniconda3/envs/si_env/lib/python3.9/site-packages/spikeinterface/sorters/basesorter.py\", line 258, in run_from_folder
SorterClass._run_from_folder(sorter_output_folder, sorter_params, verbose)
File \"/home/tabedzki/miniconda3/envs/si_env/lib/python3.9/site-packages/spikeinterface/sorters/external/kilosort4.py\", line 227, in _run_from_folder
ops, bfile, st0 = compute_drift_correction(
File \"/home/tabedzki/miniconda3/envs/si_env/lib/python3.9/site-packages/kilosort/run_kilosort.py\", line 350, in compute_drift_correction
ops, st = datashift.run(ops, bfile, device=device, progress_bar=progress_bar)
File \"/home/tabedzki/miniconda3/envs/si_env/lib/python3.9/site-packages/kilosort/datashift.py\", line 192, in run
st, _, ops = spikedetect.run(ops, bfile, device=device, progress_bar=progress_bar)
File \"/home/tabedzki/miniconda3/envs/si_env/lib/python3.9/site-packages/kilosort/spikedetect.py\", line 198, in run
ops = template_centers(ops)
File \"/home/tabedzki/miniconda3/envs/si_env/lib/python3.9/site-packages/kilosort/spikedetect.py\", line 98, in template_centers
nx = np.round((xmax - xmin) / (dminx/2)) + 1
TypeError: unsupported operand type(s) for /: 'NoneType' and 'int'
Spike sorting failed. You can inspect the runtime trace in /home/tabedzki/code/spiketutorials/Official_Tutorial_SI_0.99_Nov23/results_KS4_no_docker/spikeinterface_log.json.",
"stack": "---------------------------------------------------------------------------
SpikeSortingError Traceback (most recent call last)
Cell In[50], line 15
11 # Print the pretty-printed JSON string
12 print(json_str)
---> 15 sorting_KS4_no_docker = si.run_sorter(sorter_name='kilosort4', recording=recording_saved, remove_existing_folder=True,
16 output_folder=base_folder / 'results_KS4_no_docker',
17 verbose=True, **sorter_params, docker_image=False)
File ~/miniconda3/envs/si_env/lib/python3.9/site-packages/spikeinterface/sorters/runsorter.py:175, in run_sorter(sorter_name, recording, output_folder, remove_existing_folder, delete_output_folder, verbose, raise_error, docker_image, singularity_image, delete_container_files, with_output, **sorter_params)
168 container_image = singularity_image
169 return run_sorter_container(
170 container_image=container_image,
171 mode=mode,
172 **common_kwargs,
173 )
--> 175 return run_sorter_local(**common_kwargs)
File ~/miniconda3/envs/si_env/lib/python3.9/site-packages/spikeinterface/sorters/runsorter.py:225, in run_sorter_local(sorter_name, recording, output_folder, remove_existing_folder, delete_output_folder, verbose, raise_error, with_output, **sorter_params)
223 SorterClass.set_params_to_folder(recording, output_folder, sorter_params, verbose)
224 SorterClass.setup_recording(recording, output_folder, verbose=verbose)
--> 225 SorterClass.run_from_folder(output_folder, raise_error, verbose)
226 if with_output:
227 sorting = SorterClass.get_result_from_folder(output_folder, register_recording=True, sorting_info=True)
File ~/miniconda3/envs/si_env/lib/python3.9/site-packages/spikeinterface/sorters/basesorter.py:293, in BaseSorter.run_from_folder(cls, output_folder, raise_error, verbose)
290 print(f\"{sorter_name} run time {run_time:0.2f}s\")
292 if has_error and raise_error:
--> 293 raise SpikeSortingError(
294 f\"Spike sorting error trace:\
{log['error_trace']}\
\"
295 f\"Spike sorting failed. You can inspect the runtime trace in {output_folder}/spikeinterface_log.json.\"
296 )
298 return run_time
SpikeSortingError: Spike sorting error trace:
Traceback (most recent call last):
File \"/home/tabedzki/miniconda3/envs/si_env/lib/python3.9/site-packages/spikeinterface/sorters/basesorter.py\", line 258, in run_from_folder
SorterClass._run_from_folder(sorter_output_folder, sorter_params, verbose)
File \"/home/tabedzki/miniconda3/envs/si_env/lib/python3.9/site-packages/spikeinterface/sorters/external/kilosort4.py\", line 227, in _run_from_folder
ops, bfile, st0 = compute_drift_correction(
File \"/home/tabedzki/miniconda3/envs/si_env/lib/python3.9/site-packages/kilosort/run_kilosort.py\", line 350, in compute_drift_correction
ops, st = datashift.run(ops, bfile, device=device, progress_bar=progress_bar)
File \"/home/tabedzki/miniconda3/envs/si_env/lib/python3.9/site-packages/kilosort/datashift.py\", line 192, in run
st, _, ops = spikedetect.run(ops, bfile, device=device, progress_bar=progress_bar)
File \"/home/tabedzki/miniconda3/envs/si_env/lib/python3.9/site-packages/kilosort/spikedetect.py\", line 198, in run
ops = template_centers(ops)
File \"/home/tabedzki/miniconda3/envs/si_env/lib/python3.9/site-packages/kilosort/spikedetect.py\", line 98, in template_centers
nx = np.round((xmax - xmin) / (dminx/2)) + 1
TypeError: unsupported operand type(s) for /: 'NoneType' and 'int'
Spike sorting failed. You can inspect the runtime trace in /home/tabedzki/code/spiketutorials/Official_Tutorial_SI_0.99_Nov23/results_KS4_no_docker/spikeinterface_log.json."
}
That's a Kilosort4 error that they changed the argument. We have a fix but we haven't released it yet. It will be in 0.100.6. You can get get past it by explicitly setting dminx (their new default is 32).
And Kilosort 4.0.5 we don't have a patch for at all yet so that won't work at all (but it seems like you are working on 4.0.4). So doing the explicit dminx should fix that error.
@tabedzki
I was unable to reproduce the error on macOS.
To test I used spikeinterface to generate a simulated recording:
>>> rec, sorting = si.generate_ground_truth_recording(num_channels=64, sampling_frequency=30_000.0)
>>> rec
InjectTemplatesRecording: 64 channels - 30.0kHz - 1 segments - 300,000 samples - 10.00s
float32 dtype - 73.24 MiB
>>> sorting_ks = si.run_sorter('kilosort4', rec, './test', dminx=32)
========================================
Loading recording with SpikeInterface...
number of samples: 300000
number of channels: 64
numbef of segments: 1
sampling rate: 30000.0
dtype: float32
========================================
Preprocessing filters computed in 0.36s; total 0.36s
computing drift
Re-computing universal templates from data.
/Users/zacharymckenzie/opt/anaconda3/envs/kilosort_test/lib/python3.10/site-packages/threadpoolctl.py:1223: RuntimeWarning:
Found Intel OpenMP ('libiomp') and LLVM OpenMP ('libomp') loaded at
the same time. Both libraries are known to be incompatible and this
can cause random crashes or deadlocks on Linux when loaded in the
same Python program.
Using threadpoolctl may cause crashes or deadlocks. For more
information and possible workarounds, please see
https://github.com/joblib/threadpoolctl/blob/master/multiple_openmp.md
warnings.warn(msg, RuntimeWarning)
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:06<00:00, 1.34s/it]
drift computed in 8.19s; total 8.55s
Extracting spikes using templates
Re-computing universal templates from data.
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:06<00:00, 1.31s/it]
1078 spikes extracted in 7.91s; total 16.45s
First clustering
100%|████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 1321.77it/s]
10 clusters found, in 0.03s; total 16.48s
Extracting spikes using cluster waveforms
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:02<00:00, 2.48it/s]
1232 spikes extracted in 2.06s; total 18.54s
Final clustering
100%|████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 3467.08it/s]
6 clusters found, in 0.01s; total 18.55s
Merging clusters
6 units found, in 1.46s; total 20.01s
Saving to phy and computing refractory periods
4 units found with good refractory periods
Total runtime: 20.03s = 00:00:20 h:m:s
>>> import platform
>>> platform.system()
'Darwin'
>>> si.__version__
'0.101.0'
Kilosort version I did was 4.0.4 (since we don't have a patch for 4.0.5 yet). I'm using an M1 chip on this computer with the latest macOS. I'm wondering if the problem for you might be Jupyter to macOS communication.... I did mine straight through a python repl for this test.
My other hypothesis is that you exhausted your RAM. I made a tiny simulated dataset (~73.MiB), but for a real dataset of GBs maybe you don't have enough memory on your computer. Could you try running KS4 through the SI wrapper and use the ActivityMonitor to see your memory usage?
@zm711 thank you for getting back and offering suggestions. However, I am unable to even get your simple version working. Please see below for the more information. Thank you in advance for any help.
This was tested on Python 3.9, 3.10, 3.11.
Package versions:
conda list
spikeinterface 0.100.6 pypi_0 pypi
kilosort 4.0.4 pypi_0 pypi
Trying your commands directly from the terminal I get the following error:
(si_env) ➜ testing-spikeinterface ipython
Python 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:34:54) [Clang 16.0.6 ]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.22.2 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import spikeinterface.full as si
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
In [2]: rec, sorting = si.generate_ground_truth_recording(num_channels=64, sampling_frequency=30_000.0)
In [3]: rec
Out[3]:
InjectTemplatesRecording: 64 channels - 30.0kHz - 1 segments - 300,000 samples - 10.00s
float32 dtype - 73.24 MiB
In [4]: sorting_ks = si.run_sorter('kilosort4', rec, './test', dminx=32)
========================================
Loading recording with SpikeInterface...
number of samples: 300000
number of channels: 64
numbef of segments: 1
sampling rate: 30000.0
dtype: float32
========================================
[1] 64156 segmentation fault python3 -c "import IPython, sys; sys.exit(IPython.start_ipython())"
/opt/anaconda3/envs/si_env/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
This is the system right before I run command [4] and there is
Could you try just with python?
(si_env) python
>>>
You are testing ipython. I actually tested mine with just a python repl. I'm wondering if this is an ipython+mac issue. I can test that when I'm home later today. (ie I can rerun my test on ipython rather than python).
Same error unfortunately.
si_env) ➜ testing-spikeinterface python --version
Python 3.9.19
(si_env) ➜ testing-spikeinterface cat sample_instructions.py
import spikeinterface.full as si
rec, sorting = si.generate_ground_truth_recording(num_channels=64, sampling_frequency=30_000.0)
rec
sorting_ks = si.run_sorter('kilosort4', rec, './test', dminx=32)
(si_env) ➜ testing-spikeinterface python sample_instructions.py
========================================
Loading recording with SpikeInterface...
number of samples: 300000
number of channels: 64
numbef of segments: 1
sampling rate: 30000.0
dtype: float32
========================================
[1] 61624 segmentation fault python sample_instructions.py
/opt/anaconda3/envs/si_env/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Interactively:
(si_env) ➜ testing-spikeinterface python
Python 3.9.19 | packaged by conda-forge | (main, Mar 20 2024, 12:55:20)
[Clang 16.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import spikeinterface.full as si
>>> rec, sorting = si.generate_ground_truth_recording(num_channels=64, sampling_frequency=30_000.0)
>>> rec
InjectTemplatesRecording: 64 channels - 30.0kHz - 1 segments - 300,000 samples - 10.00s
float32 dtype - 73.24 MiB
>>> sorting_ks = si.run_sorter('kilosort4', rec, './test', dminx=32)
========================================
Loading recording with SpikeInterface...
number of samples: 300000
number of channels: 64
numbef of segments: 1
sampling rate: 30000.0
dtype: float32
========================================
[1] 62580 segmentation fault python
/opt/anaconda3/envs/si_env/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Since you said natively worked would you mind trying to install spikeinterface from source (ie 0.101.0) with kilosort. So I would do
conda create -n kilotest python=3.10
conda activate kilotest
pip install kilosort==4.0.4
cd spikeinterface
pip install -e ".[full,widgets]"
If you don't know how to download source code just let me know. I always install stuff from source so maybe we have a bug in the wrapper for macs in 0.100.x
that we will need to track down. so if we can test on main and see if that works then we will know it's a bug if it fails on main then I'll have to play around a bit more. Which chip do you have?
0.101.0
works with both python
and ipython
while 0.100.x
did not.
(kilotest) ➜ spikeinterface git:(main) ✗ conda list | rg -e kilosort -e spikeinterface
kilosort 4.0.4 pypi_0 pypi
spikeinterface 0.101.0 pypi_0 pypi
The chip is M3
@tabedzki
Are you okay with using 0.101.0? I'm a bit busy to carefully track down the bug in 0.100.x on my Mac. I still might get to it but we have a SpikeSorting Conference at the end of May so to be honest this type of seg fault debugging likely wouldn't happen until June. And at that point we may have moved onto a release of 0.101.0 (not sure of the release date of that yet).
Yes, I don't see it being a problem for now! I am glad we were able to get to a working state. Thanks for your help. I'll leave this as an open ticket.
Sounds good. If I get the time I'll try to track it down and if we end up moving fully to 101 before I have time I'll close it after the release.
Thanks Zach!
@zm711 just for if you come back to this later, the docker image requires cuda
which is why I had to do the local installation rather than docker. What we have here works; I just wanted to reiterate that.
Is there a way to modify the image (after the conference) to allow for non-cuda systems?
Totally forgot that was your original problem. Oops. Let me ping @alejoe91 back in since I don't work with the docker stuff at all. He would be better to comment on whether modifying the docker wrapper code is feasible.
I can push a fix that if cuda-python
fails disables the GPU, so you should be able to run KS4 in CPU mode for testing
That would be greatly appreciated. No immediate need as I won't get around to testing the code until Monday.
Did we ever had this push? @alejoe91
Feature you'd like to see:
I would like support for the MacOS system. I am using the M series chip to develop for our lab and test new tools locally before deploying to our server. The instructions call for the use of the CUDA python package however Macs do not support use NVIDIA hardware, making it difficult or unintuitive to test the package locally.
Additional Context
I reached out to the Kilosort4 maintainer and he responded that this is a SI issue: https://github.com/MouseLand/Kilosort/issues/674.
When I use the Kilosort4 package through the SpikeInterface software, I end up with the following error:
When going through spikeinterface: