Closed jiumao2 closed 2 months ago
That error is saying that you've run out of memory. Are you able to memory profile while you're trying to do this. I could imagine with too many jobs you may exhaust your memory whereas with fewer jobs you work more serially freeing memory as you go. I think if you're using that large of data you likely need more RAM than 64 or you go really slow in order to free memory as you go.
That error is saying that you've run out of memory. Are you able to memory profile while you're trying to do this. I could imagine with too many jobs you may exhaust your memory whereas with fewer jobs you work more serially freeing memory as you go. I think if you're using that large of data you likely need more RAM than 64 or you go really slow in order to free memory as you go.
I just don't know why parse_header
is called in write_binary_file
operation. Does it indicate that the rec
is copied multiple times? I think get_traces
does not need that much memory.
Each raw_rec
in there will need to parse header. And spikeglx has multimulple memmaps which although limit memory footprint do take up memory. My lab uses intan and the memmap for intan is usually ~2GB. So if I did 20 files I would be up to 40GBs to maintain memmaps. One issue we are actively thinking about is how to close memmaps between steps. np.memmap
doesn't guarantee that the memory for a memmap is freed so in the case of running a bunch of memmaps (during the header parsing) you could exhaust your memory that way.
The flow would be: spikeglx -> make memmap -> get_traces.
So yes get_traces should limit the memory because it just pulls what it needs from the memmap, but a lot of memmaps could be causing the problem.
I just don't know why parse_header is called in write_binary_file operation. Does it indicate that the rec is copied multiple times? I think get_traces does not need that much memory.
In windows for parallel operations the full recording is serialized and then re-initalized in each of the workers. At that point, neo calls parse_header
. Maybe there is a memory spike there during reading and because you are using A LOT of workers they just crash with each other.
My own take is that you don't really need that many jobs. If you have that much ram, just read larger chunks and divide them among less workers. I think that should work fine. That said, you should experiment.
That's so clear! Thanks a lot @zm711 @h-mayorquin ! Maybe patience is all I need 😄
I am trying to combine 23-day SpikeGLX recordings and save the preprocessed data with
n_jobs = 40
on a computer with 64 GB memory. The progress bar did not appear, and after a few minutes, I received the memory error. However, it works well whenn_jobs = 1
orn_jobs = 2
. Based on the error information, the issue might be related to the header.Thanks for any help!
The code:
The error: