Closed ecklm closed 4 years ago
Although, it does not resolve the problem, but decreasing the number of API calls to the least possible could prevent this slowing down. #26 might offer a solution to this.
It seems that it's not requests in general but only queue setting requests. And it also doesn't seem to correlate with the number of HTTP requests. The following API call reducing many separate calls to only one also latsts for 20+ seconds, which means that the problem must be on the controller side. Supposedly the actual OFP communication and the OVS' reaction is the bottleneck.
def set_queues(self):
"""Set queues on switches so that limits can be set on them."""
# self.__logger.debug("qosmanager: Switchports to be configured: {}".format(ports))
queue_limits = [QoSManager.DEFAULT_MAX_RATE] + [self.flows_limits[k][0] for k in self.flows_limits]
r = requests.post("%s/qos/queue/all" % QoSManager.CONTROLLER_BASEURL,
headers={'Content-Type': 'application/json'},
data=json.dumps({
# From doc: port_name is optional argument. If does not pass the port_name argument, all
# ports are target for configuration.
"type": "linux-htb", "max_rate": str(QoSManager.DEFAULT_MAX_RATE),
"queues":
[{"max_rate": str(limit)} for limit in queue_limits]
}))
self.log_http_response(r)
In the log:
127.0.0.1 - - [28/Apr/2020 08:40:43] "POST /qos/rules/all HTTP/1.1" 200 703 0.001785
Flow limit for flow 'FlowId(ipv4_dst='10.0.0.11', udp_dst=5001)' updated to 1250000.0bps
Flow limit for flow 'FlowId(ipv4_dst='10.0.0.12', udp_dst=5002)' updated to 6250000.0bps
Flow limit for flow 'FlowId(ipv4_dst='10.0.0.13', udp_dst=5003)' updated to 3750000.0bps
(69870) accepted ('127.0.0.1', 60166)
127.0.0.1 - - [28/Apr/2020 08:41:08] "POST /qos/queue/all HTTP/1.1" 200 1394 23.042202
This slowness seems to have nothing to do with the rest_qos
application, it quite early hands the request to the controller. This problem goes down below the VSCTL level so it's likely to be a limitation in the specific OVS implementation used in the mininet VM.
The non-blocking behavior would be extremely useful as the measurement data will also come from logs.
POST /qos/rules/all
) spins up the one used core by the Ryu manager for whatever reason. I'd expect it's idle most of the time while waiting for the switches to respond to the queue setting request. Maybe the problem is not even in the old virtual switches but some bottleneck in the controller.When a set_queue
function is called, the underlying function(s) called by the REST API handler have a timer down somewhere to prevent running an operation for a long time. I suspect, this is somewhere in the level of OpenFlow already. If I'm right, it is supposed to time out if an OpenFlow operation lasts two long. In this case, if I, say, watch a movie while experimenting, most of the cores are 90%+ used, therefore the operations last too long in the controller and on the VM => the timer expires. If I do not watch a movie or not use most of my cores for virtually anything, the operations are carried our fast enough => the timer does not expire.
This is very unfortunate and unstable, but it lies deep down in the controller and working it around would probably exceed the scope of the thesis. :(
127.0.0.1 - - [16/May/2020 22:22:08] "POST /qos/queue/all HTTP/1.1" 200 0 22.303261
Traceback (most recent call last):
File "/home/ecklm/Projects/Diploma/implementation/controller/.venv/lib/python3.8/site-packages/eventlet/hubs/hub.py", line 461, in fire_timers
timer()
File "/home/ecklm/Projects/Diploma/implementation/controller/.venv/lib/python3.8/site-packages/eventlet/hubs/timer.py", line 59, in __call__
cb(*args, **kw)
File "/home/ecklm/Projects/Diploma/implementation/controller/.venv/lib/python3.8/site-packages/eventlet/greenthread.py", line 221, in main
result = function(*args, **kwargs)
File "/home/ecklm/Projects/Diploma/implementation/controller/.venv/lib/python3.8/site-packages/eventlet/wsgi.py", line 818, in process_request
proto.__init__(conn_state, self)
File "/home/ecklm/Projects/Diploma/implementation/controller/.venv/lib/python3.8/site-packages/eventlet/wsgi.py", line 357, in __init__
self.handle()
File "/home/ecklm/Projects/Diploma/implementation/controller/.venv/lib/python3.8/site-packages/eventlet/wsgi.py", line 390, in handle
self.handle_one_request()
File "/home/ecklm/Projects/Diploma/implementation/controller/.venv/lib/python3.8/site-packages/eventlet/wsgi.py", line 466, in handle_one_request
self.handle_one_response()
File "/home/ecklm/Projects/Diploma/implementation/controller/.venv/lib/python3.8/site-packages/eventlet/wsgi.py", line 566, in handle_one_response
result = self.application(self.environ, start_response)
File "/home/ecklm/Projects/Diploma/implementation/controller/.venv/lib/python3.8/site-packages/ryu/app/wsgi.py", line 236, in __call__
return super(wsgify_hack, self).__call__(environ, start_response)
File "/home/ecklm/Projects/Diploma/implementation/controller/.venv/lib/python3.8/site-packages/webob/dec.py", line 129, in __call__
resp = self.call_func(req, *args, **kw)
File "/home/ecklm/Projects/Diploma/implementation/controller/.venv/lib/python3.8/site-packages/webob/dec.py", line 193, in call_func
return self.func(req, *args, **kwargs)
File "/home/ecklm/Projects/Diploma/implementation/controller/.venv/lib/python3.8/site-packages/ryu/app/wsgi.py", line 290, in __call__
return controller(req)
File "/home/ecklm/Projects/Diploma/implementation/controller/.venv/lib/python3.8/site-packages/ryu/app/wsgi.py", line 160, in __call__
return getattr(self, action)(req, **kwargs)
File "/home/ecklm/Projects/Diploma/implementation/controller/.venv/lib/python3.8/site-packages/ryu/app/rest_qos.py", line 440, in set_queue
return self._access_switch(req, switchid, VLANID_NONE,
File "/home/ecklm/Projects/Diploma/implementation/controller/.venv/lib/python3.8/site-packages/ryu/app/rest_qos.py", line 529, in _access_switch
msg = function(rest, vid)
File "/home/ecklm/Projects/Diploma/implementation/controller/.venv/lib/python3.8/site-packages/ryu/app/rest_qos.py", line 650, in _rest_command
key, value = func(*args, **kwargs)
File "/home/ecklm/Projects/Diploma/implementation/controller/.venv/lib/python3.8/site-packages/ryu/app/rest_qos.py", line 713, in set_queue
self.ovs_bridge.set_qos(port_name, type=queue_type,
File "/home/ecklm/Projects/Diploma/implementation/controller/.venv/lib/python3.8/site-packages/ryu/lib/ovs/bridge.py", line 536, in set_qos
self.run_command([command_qos, command_queue])
File "/home/ecklm/Projects/Diploma/implementation/controller/.venv/lib/python3.8/site-packages/ryu/lib/ovs/bridge.py", line 137, in run_command
self.vsctl.run_command(commands, self.timeout, self.exception)
File "/home/ecklm/Projects/Diploma/implementation/controller/.venv/lib/python3.8/site-packages/ryu/lib/ovs/vsctl.py", line 1295, in run_command
self._run_command(commands)
File "/home/ecklm/Projects/Diploma/implementation/controller/.venv/lib/python3.8/site-packages/ryu/lib/ovs/vsctl.py", line 1275, in _run_command
self._do_main(commands)
File "/home/ecklm/Projects/Diploma/implementation/controller/.venv/lib/python3.8/site-packages/ryu/lib/ovs/vsctl.py", line 1181, in _do_main
if self._do_vsctl(idl_, commands):
File "/home/ecklm/Projects/Diploma/implementation/controller/.venv/lib/python3.8/site-packages/ryu/lib/ovs/vsctl.py", line 1124, in _do_vsctl
status = self.txn.commit_block()
File "/home/ecklm/Projects/Diploma/implementation/controller/.venv/lib/python3.8/site-packages/ovs/db/idl.py", line 1658, in commit_block
poller.block()
File "/home/ecklm/Projects/Diploma/implementation/controller/.venv/lib/python3.8/site-packages/ovs/poller.py", line 231, in block
events = self.poll.poll(self.timeout)
File "/home/ecklm/Projects/Diploma/implementation/controller/.venv/lib/python3.8/site-packages/ovs/poller.py", line 137, in poll
rlist, wlist, xlist = select.select(self.rlist,
File "/home/ecklm/Projects/Diploma/implementation/controller/.venv/lib/python3.8/site-packages/eventlet/green/select.py", line 80, in select
return hub.switch()
File "/home/ecklm/Projects/Diploma/implementation/controller/.venv/lib/python3.8/site-packages/eventlet/hubs/hub.py", line 298, in switch
return self.greenlet.switch()
eventlet.timeout.Timeout: 5 seconds
qos_manager: Queue setting has failed. ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
Calls from QoSManager to the actual QoS implementing app (the name to be looked up) get slow after generally 3 subsequent calls per endpoint as presented below.
This makes the setup process and even later the adaptations destructively slow. This time-span may often result in trying to use deprecated data -- eg. setting queues in a switch that has just disconnected from the controller. Therefore, exceptions may also happen.
The solution may be in close connection with #3.