Open denisalevi opened 7 years ago
Just as a clarification (but maybe this is clear already): for a NeuronGroup
, the statement you quote is equivalent to
const int _N = N;
(but with the current implementation of cpp_standalone
, the N
would be directly replaced by its value in the source code), whereas for a Synapses
object it will be:
const int _N = {{N}}[0];
and finally something like:
const int _N = _array_synapses_N[0];
The reason for the constant_or_scalar
macro is that some templates are used for both objects, e.g. stateupdate
can apply to a NeuronGroup
or a Synapses
object. Also for some corner-case simulations (we used it to model astrocytes), you can connect Synapses
to/from other Synapses
, so in this case the number of pre-/post-synaptic elements is not fixed.
Hmm, yeah I figured that the constant_or_scalar
macro just gives me the right thing without having to know weather N
is known before runtime. Thanks @mstimberg for making clear how it gets translated and pointing out the special cases where it's actually needed. But is there any reason for me not to use constant_or_scalar
everywhere for now (other then less pretty code)?
And for Synapses <-> Synapses
connections, can the number of elements change during runtime or is it fixed once the synapses are generated?
Ok, so this issue is less about weather to use the constant_or_scalar
macro but more about where to use it. Currently it's used on host side and then _N
is passed as kernel argument. But in cases where N
is only known after e.g. the synapses_create_...
template is run, we copy N
to global device memory anyways, so we could just use the global device variable and safe a host memory to constant global memory copy at each kernel invocation.
But I'm realizing this is not the time for this kind of optimization. I think we will have to look in more detail into __constant__
memory and the GPU caching system at some point. A variable like N
- just like all other variables which are constant throughout the simulation, including device pointers - should probably be allocated from host with __constant__
flag, so they end up in cached constant global memory. But for that we need to dig a little into brian and get rid of all the par_ <-> ptr_
pointer copying that is happening inside our kernels right now (I think that makes sense for cpp_standalone
for restrict
flags etc. but might just waste kernel resource (e.g. registers) in gpu code). But that I dont really know either, since the compiler might just optimize the redundant pointer copying away anyways. So, another time, when propagation modes are done...
But is there any reason for me not to use constant_or_scalar everywhere for now (other then less pretty code)?
No, it's certainly fine like this. It's a macro on the jinja template level anyway, so none of this makes a difference for the code that will be compiled.
And for Synapses <-> Synapses connections, can the number of elements change during runtime or is it fixed once the synapses are generated?
They are fixed, for now we do not allow to change the number of synapses during a run.
I think at some point I started defining the
N
variable in tempaltes everywhere before kernel call withand then passing it as kernel argument, for consistency. But this is only necessary for templates where
N
is not known before runtime. So forNeuronGroup
templates e.g. we don't need that and could just avoid an extra kernel argument. This needs some cleanup. I guess going back to using the{# USES_VARIABLES { N } #}
mechanism where possible would be the best to save kernel resources.