Large memory allocation due to sparse tensors

bioAI-Oslo / Spikeometric

Spiking Neural Network Simulator based on Generalized Linear Models

GNU General Public License v3.0

7 stars 1 forks source link

Large memory allocation due to sparse tensors #30

Open lepmik opened 9 months ago

lepmik commented 9 months ago

https://github.com/bioAI-Oslo/Spikeometric/blob/31fe94ea9192170c7855ce5afa7267caab3566cf/spikeometric/models/base_model.py#L223

Any reason for why we should not work on sparse tensors?

JakobSonstebo commented 9 months ago

With the operations we're doing throughout the simulation, I couldn't find a sparse tensor format that didn't need to be converted to dense to perform some of the operations, and thus everything became much slower. Since we need very low precision for the spikes, it occupies a small fraction of total memory usage (compared to the weights), so I decided it was worth it. After the simulation is completed, the idea has been to delegate to the user to save the results as sparse tensors. In case you want to do some post-processing of the spikes, it is handy to have them in dense tensor form before saving. However, if the main usage is just immediately saving the results, returning them as a sparse tensor is probably better. What do you think?

JakobSonstebo commented 9 months ago

I can try benchmarking the sparse branch now to see how performance is affected. In the beginning of the project I compared this way of storing spikes to the just writing them to a dense tensor and found that the latter was significantly faster, but if memory is a problem (even when using torch.uin8), then it might be worth it.

JakobSonstebo commented 9 months ago

Untitled

Here is a plot showing the performance. I suspect the rolling of x might be the thing slowing it down, so maybe there is a faster way of "forgetting" the first column?

lepmik commented 9 months ago

Thank you for running the benchmarks!

I think we have to have the option for sparse iteration at least; for example, when running 100 neurons for 1e8 timesteps, it breaks on a NVIDIA GeForce RTX 3090, which for small timesteps is not that much. We could introduce a parameter sparse=True by default?

I think you are right that the roll is slow, but I can't think of any faster way of doing it. We could potentially see if we can implement a faster way of rolling.

JakobSonstebo commented 9 months ago

Maybe we could also consider saving to a file during the simulation. That is, for every N steps we save the progress to a file and resume from that point. This way we could limit memory usage and it would be faster that sparsifying at every step.