brian-team / brian2genn

Brian 2 frontend to the GeNN simulator
http://brian2genn.readthedocs.io/
GNU General Public License v2.0
47 stars 16 forks source link

[MRG] Add code slots for the `device.insert_code` mechanism #71

Closed mstimberg closed 6 years ago

mstimberg commented 6 years ago

This adds code slots that can be used with device.insert_code. This is basically the same as the change introduced for Brian2 standalone with PR brian-team/brian2#1006 but with additional before_run and after_run slots. Those are needed because in Brian2GeNN, the actual simulation run is not part of the "main lines".

mstimberg commented 6 years ago

@thesamovar , @tnowotny Apart from all the noise around build issues -- this should be fine, right?

thesamovar commented 6 years ago

Assuming it's the same as the Brian ones should be good.

mstimberg commented 6 years ago

It's the same slots, apart from the before_run/after_run because of the different way the run is handled.

tnowotny commented 6 years ago

As much as I can tell this is a good way of doing the benchmarking; and if it matches the Brian slots it should all be good.

thesamovar commented 6 years ago

Do we need additional genn-specific slots to make a like-for-like comparison with Brian? e.g. to avoid including times for setting up data structures on gpu?

mstimberg commented 6 years ago

Do we need additional genn-specific slots to make a like-for-like comparison with Brian? e.g. to avoid including times for setting up data structures on gpu?

I don't think so. All the Brian2GeNN-specific conversion/copy etc. before the run happens after the main block but before before_run and simililarly after after_run but before before_end, so we can measure this. That said, I'd include this in the "overhead" measurements, otherwise they'd be identical to Brian 2 (given that initialization/synapse generation code runs on the CPU).

thesamovar commented 6 years ago

I didn't quite follow but I'm happy to go with what you say. The key thing is we would like to measure (a) overheads from when the user hits run to when the final .exe file is about to be called, (b) overheads from the moment the .exe file starts running to the moment the first neuron gets integrated, (c) runtime in the main run loop, (d) post-run overheads similar to (a) and (b). As long as we can do that with this scheme we're fine.

mstimberg commented 6 years ago

Yes, I think we should be able to do all this (+ some more fine-grained benchmarking to e.g. separate out synapse creation). I'll document what we are measuring exactly in the new repository, stay tuned!

tnowotny commented 6 years ago

looking forward to it - by the way, we may be able to run our benchmarks also on a V100! Jamie's runs of native GeNN suggest it's a real game changer.

thesamovar commented 6 years ago

That would be fantastic!