Open denisalevi opened 3 years ago
In PR #186 I mentioned benchmarking three different implementations (mostly cudaMemset
vs thrust:fill
for variable resets). Not sure if that is worth it but if the current implementations seems to suffer from the cudaMemset
, one could profile those different implementations against each other.
PR #186 implements summed variables by parallelizing over all synapses (one thread per synapses in a
Synapses
object) and computes summed variables using globalatomicAdd
on postsynaptic variables. This creates conflicts for synapses that connect to the same postsynaptic neuron and global memory read/writes are not coalesced.Alternatively, we could do this:
For which we should consider this:
For the full discussion, see #49.