N variables in templates are inconsistent and sometimes unnecassary

denisalevi commented 7 years ago

I think at some point I started defining the N variable in tempaltes everywhere before kernel call with

const int _N = {{constant_or_scalar('N', variables['N'])}};

and then passing it as kernel argument, for consistency. But this is only necessary for templates where N is not known before runtime. So for NeuronGroup templates e.g. we don't need that and could just avoid an extra kernel argument. This needs some cleanup. I guess going back to using the {# USES_VARIABLES { N } #} mechanism where possible would be the best to save kernel resources.

mstimberg commented 7 years ago

Just as a clarification (but maybe this is clear already): for a NeuronGroup, the statement you quote is equivalent to

const int _N = N;

(but with the current implementation of cpp_standalone, the N would be directly replaced by its value in the source code), whereas for a Synapses object it will be:

const int _N = {{N}}[0];

and finally something like:

const int _N = _array_synapses_N[0];

The reason for the constant_or_scalar macro is that some templates are used for both objects, e.g. stateupdate can apply to a NeuronGroup or a Synapses object. Also for some corner-case simulations (we used it to model astrocytes), you can connect Synapses to/from other Synapses, so in this case the number of pre-/post-synaptic elements is not fixed.

denisalevi commented 7 years ago

Hmm, yeah I figured that the constant_or_scalar macro just gives me the right thing without having to know weather N is known before runtime. Thanks @mstimberg for making clear how it gets translated and pointing out the special cases where it's actually needed. But is there any reason for me not to use constant_or_scalar everywhere for now (other then less pretty code)?

And for Synapses <-> Synapses connections, can the number of elements change during runtime or is it fixed once the synapses are generated?

Ok, so this issue is less about weather to use the constant_or_scalar macro but more about where to use it. Currently it's used on host side and then _N is passed as kernel argument. But in cases where N is only known after e.g. the synapses_create_... template is run, we copy N to global device memory anyways, so we could just use the global device variable and safe a host memory to constant global memory copy at each kernel invocation.

But I'm realizing this is not the time for this kind of optimization. I think we will have to look in more detail into __constant__ memory and the GPU caching system at some point. A variable like N - just like all other variables which are constant throughout the simulation, including device pointers - should probably be allocated from host with __constant__ flag, so they end up in cached constant global memory. But for that we need to dig a little into brian and get rid of all the par_ <-> ptr_ pointer copying that is happening inside our kernels right now (I think that makes sense for cpp_standalone for restrict flags etc. but might just waste kernel resource (e.g. registers) in gpu code). But that I dont really know either, since the compiler might just optimize the redundant pointer copying away anyways. So, another time, when propagation modes are done...

mstimberg commented 7 years ago

But is there any reason for me not to use constant_or_scalar everywhere for now (other then less pretty code)?

No, it's certainly fine like this. It's a macro on the jinja template level anyway, so none of this makes a difference for the code that will be compiled.

And for Synapses <-> Synapses connections, can the number of elements change during runtime or is it fixed once the synapses are generated?

They are fixed, for now we do not allow to change the number of synapses during a run.

brian-team / brian2cuda

N variables in templates are inconsistent and sometimes unnecassary #64