brian-team / brian2cuda

A brian2 extension to simulate spiking neural networks on GPUs
https://brian2cuda.readthedocs.io/
GNU General Public License v3.0
61 stars 12 forks source link

Sort EventMonitor data by index within each time step #46

Open denisalevi opened 7 years ago

denisalevi commented 7 years ago

Currently our EventMonitor (~ SpikeMonitor) returns times and indices sorted by time but for each time step the indices are not sorted. So we get e.g. this

times =   [0,0,0, 1,1,1,1, 2,2,2,2,2]
indices = [5,1,2, 1,3,2,4, 4,2,5,1,3]

A few tests form the brian2 test suite fail because of this, since brian2 returns the indices sorted as well, like this

times =   [0,0,0, 1,1,1,1, 2,2,2,2,2]
indices = [1,2,5, 1,2,3,4, 1,2,3,4,5]

This is low priority right now, but I guess we should return the data in the same format as brian2.

We could save the number of spikes at each time step and then use that to sort the indices in the end of the simulation using thrust::sort. Or if we want to save one copy from shared to global memory per time step, we could write our own kernel for sorting.

denisalevi commented 3 years ago

I can't reproduce this bug anymore and there are no more monitor tests failing in the test suite. Closing.

denisalevi commented 2 years ago

Looks like this bug is still around, but only becomes apparent for large enough recordings. Because of warp-level SIMT, atomicAdd during thresholding will lead to neuron IDs being sorted if their indices are withing a single warp (group of 32 neurons). Since most tests use small neurongroups (typically less than 32 neurons), the data ends up being sorted. But the order of warps in the eventspace is random, hence for recordings of > 32 neurons, the eventspace is not sorted and the spikemonitor just copies from the eventspace is not sorted either.

Solution: Perform on copy operation per time step on the recorded data (either via thrust::sort or via custom sorting code in the spikemonitor kernel). Or find a way to sort the data in the end of the simulation only (maybe even in parallel on the GPU by storing the start indices for each time step during recording and then perform parallel sorting per time step in one kernel call at the end?).

For now this is a known issue and remains as such.