Closed kernfel closed 2 years ago
So, the issue I was trying to address is the following. I have a moderately complex model where, by the inscrutable magic of brian2's naming schemes, in the second run, the NeuronGroup
would be named as requested (e.g. neurons
), but its subordinate objects (Thresholder, StateUpdater etc.) would have a numeric suffix (e.g. neurons_thresholder_1
). Since process_neuron_groups
is looking specifically for {NeuronGroup.name}_thresholder
, it would not pick these suffixed objects up, and therefore not process them, nor add them to the code passed to GeNN.
So far, so good: I changed all remaining hardcoded names that I could find, reflected in the two commits above. Tests pass, job done, right?
Not so. Now that the models in the first and second run are almost completely identical, it seems that GeNN is not picking up the subtle differences in naming. To wit:
from brian2 import *
import brian2genn
import os
set_device('genn')
for i in range(2):
# os.system('rm -rf GeNNworkspace')
if i > 0:
device.reinit()
device.activate()
G = NeuronGroup(1, model='dv/dt = -v/(1*ms) : 1',
threshold='rand()>.5', method='exact', name='neurons')
S = Synapses(G, G, '', on_pre='v += 1')
S.connect()
spikemon = SpikeMonitor(G)
statemon = StateMonitor(G, variables='v', record=[0])
net = Network(G, S, spikemon, statemon)
net.run(10*ms)
print(spikemon.count, statemon.v[0, -1])
Running this code as-is, the first run (usually) passes just fine, but the second one raises during the final model compilation step:
~ ...snip: make output... ~
g++ -std=c++11 -Wno-write-strings -I. -Ibrianlib/randomkit -w -O3 -ffast-math -fno-finite-math-only -march=native -std=c++11 main.cpp code_objects/statemonitor_1_codeobject.cpp objects.cpp code_objects/synapses_1_synapses_create_generator_codeobject.cpp code_objects/synapses_1_pre_push_spikes.cpp code_objects/spikemonitor_1_codeobject.cpp brianlib/randomkit/randomkit.cc -o main -Lmagicnetwork_model_CODE -lrunner -Wl,-rpath magicnetwork_model_CODE
~ ...snip: warnings... ~
In file included from main.cpp:19:
engine.cpp: In member function ‘void engine::run(double)’:
engine.cpp:45:19: error: ‘vneurongroup_1’ was not declared in this scope; did you mean ‘vneurongroup’?
45 | std::copy_n(vneurongroup_1, 1, brian::_array_neurongroup_1_v);
| ^~~~~~~~~~~~~~
| vneurongroup
engine.cpp:51:7: error: ‘pullneurongroup_1CurrentSpikesFromDevice’ was not declared in this scope; did you mean ‘pullneurongroupCurrentSpikesFromDevice’?
51 | pullneurongroup_1CurrentSpikesFromDevice();
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| pullneurongroupCurrentSpikesFromDevice
engine.cpp:52:7: error: ‘pullneurongroup_1StateFromDevice’ was not declared in this scope; did you mean ‘pullneurongroupStateFromDevice’?
52 | pullneurongroup_1StateFromDevice();
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| pullneurongroupStateFromDevice
main.cpp: In function ‘int main(int, char**)’:
main.cpp:73:30: error: ‘rowLengthsynapses_1’ was not declared in this scope; did you mean ‘rowLengthsynapses’?
73 | rowLengthsynapses_1, indsynapses_1, maxRowLengthsynapses_1,
| ^~~~~~~~~~~~~~~~~~~
| rowLengthsynapses
main.cpp:73:51: error: ‘indsynapses_1’ was not declared in this scope; did you mean ‘indsynapses’?
73 | rowLengthsynapses_1, indsynapses_1, maxRowLengthsynapses_1,
| ^~~~~~~~~~~~~
| indsynapses
main.cpp:73:66: error: ‘maxRowLengthsynapses_1’ was not declared in this scope; did you mean ‘maxRowLengthsynapses’?
73 | rowLengthsynapses_1, indsynapses_1, maxRowLengthsynapses_1,
| ^~~~~~~~~~~~~~~~~~~~~~
| maxRowLengthsynapses
main.cpp:79:49: error: ‘ineurongroup_1’ was not declared in this scope; did you mean ‘ineurongroup’?
79 | std::copy_n(brian::_array_neurongroup_1_i, 1, ineurongroup_1);
| ^~~~~~~~~~~~~~
| ineurongroup
main.cpp:80:49: error: ‘vneurongroup_1’ was not declared in this scope; did you mean ‘vneurongroup’?
80 | std::copy_n(brian::_array_neurongroup_1_v, 1, vneurongroup_1);
| ^~~~~~~~~~~~~~
| vneurongroup
main.cpp:87:7: error: ‘_seedneurongroup_1’ was not declared in this scope; did you mean ‘_seedneurongroup’?
87 | _seedneurongroup_1[i]= (uint64_t) (rand()*0x0000FFFFFFFFFFFFLL);
| ^~~~~~~~~~~~~~~~~~
| _seedneurongroup
code_objects/spikemonitor_1_codeobject.cpp: In function ‘void _run_spikemonitor_1_codeobject()’:
code_objects/spikemonitor_1_codeobject.cpp:108:24: error: ‘spikeCount_neurongroup_1’ was not declared in this scope; did you mean ‘spikeCount_neurongroup’?
108 | int32_t _num_events = spikeCount_neurongroup_1;
| ^~~~~~~~~~~~~~~~~~~~~~~~
| spikeCount_neurongroup
code_objects/spikemonitor_1_codeobject.cpp:117:23: error: ‘spike_neurongroup_1’ was not declared in this scope; did you mean ‘spike_neurongroup’?
117 | const int _idx = spike_neurongroup_1[_j];
| ^~~~~~~~~~~~~~~~~~~
| spike_neurongroup
make: *** [Makefile:10: main] Error 1
ERROR Brian 2 encountered an unexpected error. If you think this is a bug in Brian 2, please report this issue either to the discourse forum at <http://brian.discourse.group/>, or to the issue tracker at <https://github.com/brian-team/brian2/issues>. Please include this file with debug information in your report: /tmp/brian_debug_e2sgnrl0.log Additionally, you can also include a copy of the script that was run, available at: /tmp/brian_script_rfh3a_al.py You can also include a copy of the redirected std stream outputs, available at '/tmp/brian_stdout_l9pglhv0.log' and '/tmp/brian_stderr_yv5s018d.log'. Thanks! [brian2]
Traceback (most recent call last):
File "/home/felix/projects/lib/brian2genn/brian2genn/device.py", line 877, in build
self.compile_source(debug, directory, use_GPU)
File "/home/felix/projects/lib/brian2genn/brian2genn/device.py", line 1166, in compile_source
check_call(["make"], cwd=directory, env=env)
File "/usr/lib/python3.9/subprocess.py", line 373, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['make']' returned non-zero exit status 2.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/felix/projects/ikeda/scratch.py", line 24, in <module>
net.run(10*ms)
File "/home/felix/projects/lib/brian2/brian2/core/base.py", line 291, in device_override_decorated_function
return getattr(curdev, name)(*args, **kwds)
File "/home/felix/projects/lib/brian2genn/brian2genn/device.py", line 1844, in network_run
super(GeNNDevice, self).network_run(net=net, duration=duration,
File "/home/felix/projects/lib/brian2/brian2/devices/cpp_standalone/device.py", line 1502, in network_run
self.build(direct_call=False, **self.build_options)
File "/home/felix/projects/lib/brian2genn/brian2genn/device.py", line 879, in build
raise RuntimeError(('Project compilation failed (Command {cmd} '
RuntimeError: Project compilation failed (Command ['make'] failed with error code 2).
See the output above (if any) for more details.
There seems to be a disconnect between what's provided to GeNN (code objects, magicnetwork_model.cpp etc. with suffixed neurongroup_1
etc.) and what comes out the other end (magicnetwork_model_CODE/* with suffix-less neurongroup
). I've found that only the Makefile is actually touched, whereas all other files in the _CODE directory are left untouched in the second iteration.
Of course, removing the GeNNworkspace directory fixes this, but that's hardly an ideal workaround.
Since this may well be a GeNN issue, I'm tagging @neworderofjamie here.
Unrelatedly, and without wanting to add to everyone's workload: Is it just me, or is brian2 master + brian2genn master + GeNN 4.6.0 completely broken?
Nice work @kernfel and definitely an interesting potential bug 😄 Can you try two things:
magicnetwork_model.cpp
change between runs?-r
onto the genn-buildmodel.sh command line, does the problem go away? (this bypasses GeNN's new caching system and forces a model rebuild)
- Does
magicnetwork_model.cpp
change between runs?
Yes: All instances of synapses
are replaced with synapses_1
in the second iteration (as they should, following the code object naming scheme).
- If you hack a
-r
onto the genn-buildmodel.sh command line, does the problem go away? (this bypasses GeNN's new caching system and forces a model rebuild)
Assuming you meant -f
: Yes, that rescues it.
I did indeed mean -f
but that is a smoking gun - will fix ASAP
Ah, the perils of checking further things while writing a bug report. I notice the errors reported in the second post are a little more extensive than what you get with the code I provided. Removing the explicit NeuronGroup
name in the code restores the posted error.
This does not impinge on the bug, per se. (... although it suggests another solution: Just name all your objects?)
... which actually works and I wish I'd realised that sooner. A nice catch-22:
Can you try https://github.com/genn-team/genn/pull/495?
Yup, that fixes it. Thank you so much!
Wonderful!
stupid question: Does this mean that if you name your populations no recompile is necessary/done and if you don't name them explicitly Brian 2 will rename and force a GeNN recompile? If this is the case one should recommend explicit naming ...? (apologies if I am having the completely wrong end of the stick)
You're absolutely right, Thomas. Brian2 respects explicit names, but will number any unnamed items it gets. I don't think this makes a difference on the Brian2 end of the pipeline, but I'd agree that for b2g purposes, it would make sense to recommend explicit names throughout.
Incidentally, the subordinate objects whose automatic naming and numbering gave rise to this PR in the first place are only Brian-side, of course.
Having used this successfully with Jamie's fix for a while now, and said fix being merged into GeNN master, I believe this is ready for prime time.
Hi everyone. Apologies, I did not really follow this issue until now, but I will have a look later today. I'll merge it right away if I have nothing to complain about :blush:
No apologies necessary! I'm happy to contribute at a moderate pace, no complaints here.
Thanks for the PR, this all makes a lot of sense and not relying on the names should be much more robust.
Just to give some background on the whole name machinery: as you saw, Brian will respect names that you set manually. It will only complain when you call build
/run
and you have given identical name to several objects. If you don't give names, it will automatically choose names, appending _1
etc. to make the names unique. The problem is that our current system requires names to be set when we create the object, so the mechanism for the unique naming keeps a list of all existing objects (using "weak references", so this list will not keep the objects alive) and picks suffixes accordingly. Unfortunately, this system has its limits when you run multiple simulations in a loop. E.g. with:
for _ in range(3):
G = NeuronGroup(...)
run(...)
The group will be named neurongroup
in the first run, but when it creates a new group for the second run, the first one still exists so it has to name the new group neurongroup_1
. Right after that line, the old group does not exist anymore, so the name neurongroup
becomes available again. As a result, the third run will again name the object neurongroup
, and so on. You can get around this with name='neurongroup'
, but there's no mechanism to manually set the name of sub-objects like Thresholder
and so on, so these will still cycle through the names with and without a suffix. This is indeed problematic because it interferes with the caching mechanism, see also brian-team/brian2#680 . In the long run, I think the best solution would be to not require names at object creation time, and only generate automatic names when objects are added to a Network
. As a short term solution, manually giving names is indeed the best approach (not only for GeNN's caching, but also for Brian's other modes). We should be able to fix the problem with the sub-objects quite easily: some sub-objects exists only once per object (e.g. a StateUpdater
), so we can directly give it a fixed name based on the parent name – if the parent name is unique, then the sub-object name is unique as well. Other sub-object types can have multiple instances (e.g. Thresholder
for custom events), but for them, we can make the name unique ourselves, e.g. by including the name of the event in the case of a Thresholder
.
Thanks for the additional changes, going ahead with the merge :+1:
This pull request attempts to fix an issue where the names of StateUpdaters, Thresholders etc. are suffixed with numeric indices, and are therefore not picked up by the hardcoded name checks in process_*_groups.
However, since this fix raises other hidden issues (see below), I can't recommend merging it just yet.