It is possible that two different processes will try to start a virtual firewall which might cause both of them to fail. This was obseerved during this test case https://github.com/0-complexity/openvcloud/blob/master/tests/ovc_master_hosted/OVC/c_advanced/maintenance_tests.py#L293. In that test case it puts a node in maintenance mode and then tries to start a virtual firewall that was on that node, this might cause the described scenario where both of these actions try to start a virtual firewall.
Locks should be introduced to prevent this behavior when starting or stopping a virtual firewall.
Relevant stacktraces
This is an example of what might happen:
Traceback (most recent call last):
~ File "/opt/jumpscale7/lib/JumpScale/grid/jumpscripts/JumpscriptFactory.py", line 176, in executeInProcess
return True, self.module.action(*args, **kwargs)
~ File "/tmp/jumpscripts/jumpscale_vfs_create_routeros.py", line 138, in action
% (networkid, networkidHex, e)
~ RuntimeError: Could not create VFW vm from template, network id:424:01a8
Could not execute job, error:
Traceback (most recent call last):
~ File "/opt/jumpscale7/lib/JumpScale/grid/jumpscripts/JumpscriptFactory.py", line 176, in executeInProcess
return True, self.module.action(*args, **kwargs)
~ File "/tmp/jumpscripts/unknown_createVM.py", line 18, in action
return createVM(xml)
~ File "/tmp/jumpscripts/unknown_createVM.py", line 13, in createVM
dom.create()
~ File "/usr/lib/python2.7/dist-packages/libvirt.py", line 1035, in create
if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
~ libvirtError: Cannot access storage file '/var/lib/libvirt/images/routeros/01a8/routeros.qcow2' (as uid:113, gid:116): No such file or directory
type/level: OPERATIONS/1
Could not execute jscript:1000003 unknown_createVM on agent:107_9
Error: Exec error procmgr jumpscr:unknown_createVM on node:107_9 <class 'libvirt.libvirtError'>: Cannot access storage file '/var/lib/libvirt/images/routeros/01a8/routeros.qcow2' (as uid:113, gid:116): No such file or directory
Detailed description
It is possible that two different processes will try to start a virtual firewall which might cause both of them to fail. This was obseerved during this test case https://github.com/0-complexity/openvcloud/blob/master/tests/ovc_master_hosted/OVC/c_advanced/maintenance_tests.py#L293. In that test case it puts a node in maintenance mode and then tries to start a virtual firewall that was on that node, this might cause the described scenario where both of these actions try to start a virtual firewall.
Locks should be introduced to prevent this behavior when starting or stopping a virtual firewall.
Relevant stacktraces
This is an example of what might happen: