Memory error when saving SpikeGLX recordings in parallel

jiumao2 commented 2 months ago

I am trying to combine 23-day SpikeGLX recordings and save the preprocessed data with n_jobs = 40 on a computer with 64 GB memory. The progress bar did not appear, and after a few minutes, I received the memory error. However, it works well when n_jobs = 1 or n_jobs = 2. Based on the error information, the issue might be related to the header.

Thanks for any help!

The code:

for folder_this in folder_data: # length = 23
    base_folder = Path(os.path.join(folder_raw_data, folder_this))
    spikeglx_folder = base_folder / 'Exp_g0'

    raw_rec = si.read_spikeglx(spikeglx_folder, stream_name='imec0.ap', load_sync_channel=False)

    bad_channel_ids = ['imec0.ap#AP191']
    rec_bad_channels_removed = raw_rec.remove_channels(bad_channel_ids)
    rec_phase_shifted = si.phase_shift(rec_bad_channels_removed)
    rec_centered = si.center(rec_phase_shifted)
    rec_cmr = si.common_reference(rec_centered, operator="median", reference="global")
    rec_scaled = si.scale(rec_cmr, scale)
    rec_int16 = si.astype(rec_scaled, 'int16')

    rec_all.append(rec_int16)
rec = si.concatenate_recordings(rec_all)

job_kwargs = dict(n_jobs=40, chunk_duration='1s', progress_bar=True) 
rec.save(folder=folder_root / 'Preprocess', format='binary', overwrite=True, **job_kwargs)

The error:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\ProgramData\Anaconda3\envs\spikeinterface\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\ProgramData\Anaconda3\envs\spikeinterface\lib\multiprocessing\spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
  File "C:\Work\HY\spikeinterface\src\spikeinterface\core\base.py", line 529, in from_dict
    extractor = _load_extractor_from_dict(dictionary)
  File "C:\Work\HY\spikeinterface\src\spikeinterface\core\base.py", line 1135, in _load_extractor_from_dict
    extractor = extractor_class(**new_kwargs)
  File "C:\Work\HY\spikeinterface\src\spikeinterface\extractors\neoextractors\spikeglx.py", line 65, in __init__
    NeoBaseRecordingExtractor.__init__(
  File "C:\Work\HY\spikeinterface\src\spikeinterface\extractors\neoextractors\neobaseextractor.py", line 188, in __init__
    _NeoBaseExtractor.__init__(self, block_index, **neo_kwargs)
  File "C:\Work\HY\spikeinterface\src\spikeinterface\extractors\neoextractors\neobaseextractor.py", line 27, in __init__
    self.neo_reader = self.get_neo_io_reader(self.NeoRawIOClass, **neo_kwargs)
  File "C:\Work\HY\spikeinterface\src\spikeinterface\extractors\neoextractors\neobaseextractor.py", line 66, in get_neo_io_reader
    neo_reader.parse_header()
  File "C:\ProgramData\Anaconda3\envs\spikeinterface\lib\site-packages\neo\rawio\baserawio.py", line 189, in parse_header
    self._parse_header()
  File "C:\ProgramData\Anaconda3\envs\spikeinterface\lib\site-packages\neo\rawio\spikeglxrawio.py", line 196, in _parse_header
    self._events_memmap = np.unpackbits(extracted_word.astype(np.uint8)[:, None], axis=1)
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 299. MiB for an array with shape (39175233, 8) and data type uint8

zm711 commented 2 months ago

That error is saying that you've run out of memory. Are you able to memory profile while you're trying to do this. I could imagine with too many jobs you may exhaust your memory whereas with fewer jobs you work more serially freeing memory as you go. I think if you're using that large of data you likely need more RAM than 64 or you go really slow in order to free memory as you go.

jiumao2 commented 2 months ago

That error is saying that you've run out of memory. Are you able to memory profile while you're trying to do this. I could imagine with too many jobs you may exhaust your memory whereas with fewer jobs you work more serially freeing memory as you go. I think if you're using that large of data you likely need more RAM than 64 or you go really slow in order to free memory as you go.

I just don't know why parse_header is called in write_binary_file operation. Does it indicate that the rec is copied multiple times? I think get_traces does not need that much memory.

zm711 commented 2 months ago

Each raw_rec in there will need to parse header. And spikeglx has multimulple memmaps which although limit memory footprint do take up memory. My lab uses intan and the memmap for intan is usually ~2GB. So if I did 20 files I would be up to 40GBs to maintain memmaps. One issue we are actively thinking about is how to close memmaps between steps. np.memmap doesn't guarantee that the memory for a memmap is freed so in the case of running a bunch of memmaps (during the header parsing) you could exhaust your memory that way.

The flow would be: spikeglx -> make memmap -> get_traces.

So yes get_traces should limit the memory because it just pulls what it needs from the memmap, but a lot of memmaps could be causing the problem.

h-mayorquin commented 2 months ago

I just don't know why parse_header is called in write_binary_file operation. Does it indicate that the rec is copied multiple times? I think get_traces does not need that much memory.

In windows for parallel operations the full recording is serialized and then re-initalized in each of the workers. At that point, neo calls parse_header. Maybe there is a memory spike there during reading and because you are using A LOT of workers they just crash with each other.

My own take is that you don't really need that many jobs. If you have that much ram, just read larger chunks and divide them among less workers. I think that should work fine. That said, you should experiment.

jiumao2 commented 2 months ago

That's so clear! Thanks a lot @zm711 @h-mayorquin ! Maybe patience is all I need 😄

SpikeInterface / spikeinterface

Memory error when saving SpikeGLX recordings in parallel #3424