brian-team / brian2

Brian is a free, open source simulator for spiking neural networks.
http://briansimulator.org
Other
927 stars 219 forks source link

Running multiple instances of a `Network` by automatically merging groups and duplicating synapses #1369

Open denisalevi opened 2 years ago

denisalevi commented 2 years ago

Hi there,

have you ever thought about implementing something like a Network.multiple_runs(N_runs, ..., **run_kwargs) method, which defines a new network by merging multiple repetitions of NeuronGroup objects into a single new NeuronGroup and by repeating the connectivity patterns between subgroups, such that the new network implements multiple repetitions of the old network? Or does something along those lines already exist?

This would probably make little sense for C++ standalone or runtime devices, but could be great for Brian2CUDA and Brian2GeNN, where users will often have access only to few GPUs and where simulating multiple networks in parallel on a single GPU is often not possible (I know it is possible to set up GPUs to support this, but I haven't tried that yet). Instead, doing this merge automatically before code generation could allow easy simulation of multiple network instances on a single GPU.

For simple networks, this seems pretty straight forward to do. Might need some thinking about how to distribute parameters across subnetworks and how to split the results after simulation for a meaningful access by the user. But maybe there are some tricky parts I'm not thinking of right now?

This could also include the option to not just concatenate the network multiple times but implement some kind of stride theme, such that the structured connectivity that would result from just concatenating is more distributed across the neurons of the merged NeuronGroups. Depending on the use case, that might be beneficial for parallelization of e.g. spike propagation on the GPU, where synaptic effects are typically applied using atomic operations and where more distribution of spiking synapses across CUDA warps / blocks can reduce atomic conflicts. Okay, I'm dreaming a bit here, this would of course not be priority.

921 popped up from a quick issue search. It sounds like they are doing something similar in their CxSystem, haven't looked into it in detail though.

This could of course also be implemented in the generated code. But it feels to me like doing this on Python side might be easier and would then be generally usable for all devices.

denisalevi commented 2 years ago

Since opening this issue, we discussed this with @mstimberg elsewhere. Just a little update here.

@Edinburgher will work on a first implementation of this for his Bachelor thesis. We discussed that it makes sense to keep this separate from Brian2 for now and integrate it only when it has progressed enough.

Mostly, because the syntax for this features needs some additional thinking. The proposition in this issue is in principle similar to the encapsulation proposal in #251 and also to #1239. While the implementation of these issues would be different, the syntax could be the same. The best case would even that for GPU backends, the encapsulation mode automatically determines how many networks fit on a single GPU, generates a merged network of that size, and then runs multiple merged networks in a loop on the GPU (the loop over networks is what the encapsulation mode is supposed to achieve from what I understand after skimming the issue).

But for now, we will try to get the merging idea working on its own at denisalevi/brian2-network-multiplier and once it is working on some way, one can think about the correct syntax and how to integrate it with Brian.

@mstimberg @thesamovar Pinging you two here since the encapsulation mode development is in your hands. If you have any thoughts on this, feel free to share. And if you want to follow the development, feel free to follow denisalevi/brian2-network-multiplier.