genn-team / ml_genn

A library for deep learning with Spiking Neural Networks (SNN).
https://ml-genn.readthedocs.io
GNU Lesser General Public License v2.1
24 stars 7 forks source link

Performance investigation #112

Open neworderofjamie opened 2 months ago

neworderofjamie commented 2 months ago

Because EventProp is really fast, Amdahl's law once again strikes and CPU-side overheads start to become problematic, especially when training on large datasets like SSC. Training one batch takes approximately 25ms but there's 2ms 'gaps' between batches. With 2359 batches in the training set this corresponds to about 1 minute of actual training computation and 5s of time spend between batches per-epoch. Examining this period in Nsight Systems shows the following (memcpy coming in from the left is readout and only appears massive as it was added to the command queue a long time before - actual time is tiny purple bar):

image

Biggest blocks of GPU time are:

Biggest blocks of CPU time (i.e. GPU idle time) are:

Possible ways to improve these overheads include:

I think, when balancing performance with attempting to maintain backward-compatibility, adding support for copying multiple-batches to the GPU at a time while keeping the current data structure would probably be the best option.