NeuralEnsemble / python-neo

Neo is a package for representing electrophysiology data in Python, together with support for reading a wide range of neurophysiology file formats
http://neo.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
325 stars 248 forks source link

Consumes a huge amount of memory during processing #1527

Open mousphere opened 3 months ago

mousphere commented 3 months ago

I want to read data from a dat file using RawBinarySignalIO and plot the data. When I ran the program below, it consumed more than 32GB of memory for a 1GB dat file. I want to run this process on AWS Lambda, so I need it to execute with less than 10GB of memory. Is there a way to achieve this?

import neo
import matplotlib.pyplot as plt
import numpy as np

data_file = 'example.dat'

nb_channel = 40
analog_digital_input_channels = 8
sampling_rate = 30000

reader = neo.io.RawBinarySignalIO(filename=data_file, nb_channel=nb_channel, sampling_rate=sampling_rate)
block = reader.read_block()

analog_signals = block.segments[0].analogsignals[0]

analog_signals = analog_signals[:, :nb_channel - analog_digital_input_channels]

# plot the data
plt.figure(figsize=(10, 6))
for i in range(analog_signals.shape[1]):
   plt.plot(analog_signals[:, i] + i * 100, label=f'Channel {i+1}')
plt.xlabel('Time (samples)')
plt.ylabel('Amplitude')
plt.title('Analog Signals')
plt.legend()
plt.savefig('analog_signals_plot.png')
plt.close()
zm711 commented 3 months ago

@mousphere,

What's the dtype of the binary file? Could you provide a bit more info about what the .dat file is? You're sure it's headerless (ie despite the huge memory spike does the reader seem to work?)

One thing you could try would be to do the same at the rawio level. Have you used that before? I'm wondering if reshape is causing the huge memory spike.

If you test the rawio level and it still has the memory spike then I think I know how to fix it. We would have to slow the RawIO level down to protect the memory.

mousphere commented 3 months ago

@zm711 The dtype is int. Even though the memory usage increased, the reader was working. When I newly set lazy=True in the read_block function, the rate of increase in memory usage slowed down, but it still used about 16GB (I stopped the process midway because the graph generation did not complete even after more than an hour).

Additionally, trying to plot in stages using chunks or using plt.subplots() instead of plt.plot() did not solve the issue. It might be that matplotlib is using a lot of memory during processing.

h-mayorquin commented 3 months ago

How do you know that is using that much memory? How are you measuring rss?

mousphere commented 3 months ago

@h-mayorquin I checked it in the Mac Activity Monitor.

samuelgarcia commented 2 months ago

My guess is that the matplotlib is consuming many memory.

What is the memory consuption when doing only this ?


import neo
import numpy as np

data_file = 'example.dat'

nb_channel = 40
analog_digital_input_channels = 8
sampling_rate = 30000

reader = neo.io.RawBinarySignalIO(filename=data_file, nb_channel=nb_channel, sampling_rate=sampling_rate)
block = reader.read_block()

analog_signals = block.segments[0].analogsignals[0]
numpy_signal = analog_signals.magnitude[:, :nb_channel - analog_digital_input_channels]