Closed sunjie454 closed 10 months ago
Have tried to save into binary before running the peak detection. you could something like this
rec_rhd = read_int('/my file')
rec_rhd.save(folder='/saved_folder')
rec = load_extractor('/saved_folder')
peaks = detect_peak(recn ...)
The RDH format is pretty bad and CPU/memory expenssive.
Normally I wrote the detect_peak()
to be the more efficient I could. I guess your are using the numba localliexlusive method (this is the fastest and more accurate). Are you running this on window or linux ?
Clearly the multiprocessing is easier on linux, so for enormous and challenging dataset installing a linx with double could be a solution.
With detect_peaks, this is much faster to use the option method='locally_exclusive' and with a radius_um=0 if you want to count the number of peaks per channel
Have tried to save into binary before running the peak detection. you could something like this
rec_rhd = read_int('/my file') rec_rhd.save(folder='/saved_folder') rec = load_extractor('/saved_folder') peaks = detect_peak(recn ...)
The RDH format is pretty bad and CPU/memory expenssive.
Normally I wrote the
detect_peak()
to be the more efficient I could. I guess your are using the numba localliexlusive method (this is the fastest and more accurate). Are you running this on window or linux ? Clearly the multiprocessing is easier on linux, so for enormous and challenging dataset installing a linx with double could be a solution.
Thank you for your response. I tried the method you suggested on the Linux system, and it indeed worked when the threshold was set to 4.3. However, at lower thresholds such as 3.4, the processing time remains excessively long. For instance, with the same dataset, it takes 2 minutes at a threshold of 4.3, while at a threshold of 3.4, it requires 16 hours or even longer.
With detect_peaks, this is much faster to use the option method='locally_exclusive' and with a radius_um=0 if you want to count the number of peaks per channel
Thank you for your response. I tried the method you suggested on the Linux system, and it indeed worked when the threshold was set to 4.3. However, at lower thresholds such as 3.4, the processing time remains excessively long. For instance, with the same dataset, it takes 2 minutes at a threshold of 4.3, while at a threshold of 3.4, it requires 16 hours or even longer.
Could you share 30s of your file so we could make some tests ? How big is your file ?
Could you share 30s of your file so we could make some tests ? How big is your file ?
Thanks for your reply. Our data is big. For each 64 channel data, it is about 10-40 GB.
the setting is:
job_kwargs = dict(n_jobs=-1, chunk_duration="1s", progress_bar=True) peaks = detect_peaks( recording_preprocessed, method='locally_exclusive', peak_sign='both', gather_mode="memory", # mp_context="fork", detect_threshold=sorting_threshold, # detect_threshold is median absolute deviations (MAD) exclude_sweep_ms=0.1, radius_um=0, noise_levels=noise_levels_int16, **job_kwargs)
In seconds, it finished, and shows: detect peaks using locally_exclusive: 0%| | 0/630 [00:00<?, ?it/s] detect peaks using locally_exclusive: 0%| | 1/630 [00:01<15:39, 1.49s/it] detect peaks using locally_exclusive: 0%| | 2/630 [00:03<16:17, 1.56s/it] detect peaks using locally_exclusive: 1%| | 7/630 [00:03<04:36, 2.25it/s] detect peaks using locally_exclusive: 1%|▏ | 9/630 [00:04<03:36, 2.86it/s] detect peaks using locally_exclusive: 2%|▏ | 11/630 [00:05<03:56, 2.61it/s] detect peaks using locally_exclusive: 7%|▋ | 45/630 [00:05<00:32, 18.02it/s] detect peaks using locally_exclusive: 8%|▊ | 51/630 [00:05<00:32, 17.61it/s] detect peaks using locally_exclusive: 68%|██████▊ | 429/630 [00:06<00:00, 280.50it/s] detect peaks using locally_exclusive: 84%|████████▍ | 531/630 [00:06<00:00, 229.30it/s] detect peaks using locally_exclusive: 96%|█████████▌| 606/630 [00:06<00:00, 247.87it/s] detect peaks using locally_exclusive: 100%|██████████| 630/630 [00:07<00:00, 89.86it/s]
The problem is, after this fast process, it will stop here for hours. And only one CPU core is running. So I don't know what happens after the progress bar increased to 100%.
What are you doing after detecting peaks? Maybe it gets stuck later!
What are you doing after detecting peaks? Maybe it gets stuck later!
Yes, you are right. It stuck later. I print the runtime after peaks = detect_peaks(...) . But it only print out after later processes finished.
Now this issue is solved. 100 times faster than before. Thanks.
Our data is collected by Intan devices in RHD format, which is then converted into raw files and stored using spikeinterface. We then read the raw files again and obtain MUA (Multi-Unit Activity) through the detect_peaks( ) method. However, our data size is extremely large, and the processing speed is very slow and unstable. There are frequent instances where the processing gets stuck and doesn't complete. In these situations, sometimes the results are outputted, and sometimes they are not. Additionally, the computer does not show any error messages, and I have observed that there is no memory overflow. This situation is quite perplexing. Therefore, I am wondering if there are methods in spikeinterface to enhance the efficiency and stability of MUA processing. For example: (1) How to resolve the issue of MUA processing getting stuck and improve the overall stability of the program; (2) How to read raw files, process them into MUA data, and store them simultaneously, instead of reading all raw files first, processing, and then writing the output; (3) How to utilize global_job_kwargs to enhance processing speed; (4) How to improve the overall processing speed of the program. Looking forward to everyone's response and answers.