crossbario / crossbar

Crossbar.io - WAMP application router
https://crossbar.io/
Other
2.05k stars 274 forks source link

Routercluster worker placement improvements #2012

Open oberstet opened 2 years ago

oberstet commented 2 years ago

The dynamic placement and replacement logic of the master node should be reworked and improved

https://github.com/crossbario/crossbar/blob/95b30b1a03e9596191887af2738f04b9624ff11b/crossbar/master/arealm/arealm.py#L147 https://github.com/crossbario/crossbar/blob/95b30b1a03e9596191887af2738f04b9624ff11b/crossbar/master/cluster/routercluster.py#L1150


Test: It should possible to add a router cluster group without any nodes present in the router cluster.

crossbar shell --realm default create routercluster cluster2

# Don't add nodes yet here !
# crossbar shell --realm default add routercluster-node cluster2 all \
#     --config="${RE_ROUTERCLUSTER_NODE}"

crossbar shell --realm default add routercluster-workergroup cluster2 group1 \
    --config="${RE_ROUTERCLUSTER_GROUP}"

Trying the above the master node fails with:

2022-05-08T12:31:35+0000 [Container      52] <crossbar.master.cluster.routercluster.RouterClusterManager.get_routercluster_by_name>(routercluster_name="cluster2", details=CallDetails(registration=<autobahn.wamp.request.Registration object at 0x7ffb89bb55e0>, progress=None, caller=232698914436104, caller_authid=superuser, caller_authrole=owner, procedure=<crossbarfabriccenter.mrealm.routercluster.get_routercluster_by_name>, transaction_hash=None, enc_algo=None, forward_for=None))
master  | 2022-05-08T12:31:35+0000 [Container      52] <crossbar.master.cluster.routercluster.RouterClusterManager.add_routercluster_workergroup>(routercluster_oid=8329cb87-7ec7-4bf7-991c-916293d8c30a, workergroup={'cluster_oid': '8329cb87-7ec7-4bf7-991c-916293d8c30a',
master  |  'name': 'group1',
master  |  'scale': 1}, details=CallDetails(registration=<autobahn.wamp.request.Registration object at 0x7ffb89bb54a0>, progress=None, caller=232698914436104, caller_authid=superuser, caller_authrole=owner, procedure=<crossbarfabriccenter.mrealm.routercluster.add_routercluster_workergroup>, transaction_hash=None, enc_algo=None, forward_for=None))
master  | 2022-05-08T12:31:35+0000 [Container      52] New router worker group object stored in database:
master  | {'changed': 1652013095252210500,
master  |  'cluster_oid': '8329cb87-7ec7-4bf7-991c-916293d8c30a',
master  |  'description': None,
master  |  'label': None,
master  |  'name': 'group1',
master  |  'oid': 'f5749628-f587-4da8-9415-ac85f1b35aca',
master  |  'scale': 1,
master  |  'status': 'STOPPED',
master  |  'tags': None}
master  | 2022-05-08T12:31:35+0000 [Container      52] MrealmController.onUserError(): "IndexError: list index out of range"
master  | Traceback (most recent call last):
master  |   File "/usr/local/lib/python3.9/site-packages/txaio/tx.py", line 366, in as_future
master  |     return ensureDeferred(fun(*args, **kwargs))
master  |   File "/usr/local/lib/python3.9/site-packages/twisted/internet/defer.py", line 1129, in ensureDeferred
master  |     return Deferred.fromCoroutine(coro)
master  |   File "/usr/local/lib/python3.9/site-packages/twisted/internet/defer.py", line 1105, in fromCoroutine
master  |     return _cancellableInlineCallbacks(coro)
master  |   File "/usr/local/lib/python3.9/site-packages/twisted/internet/defer.py", line 1815, in _cancellableInlineCallbacks
master  |     _inlineCallbacks(None, gen, status)
master  | --- <exception caught here> ---
master  |   File "/usr/local/lib/python3.9/site-packages/autobahn/wamp/protocol.py", line 555, in _type_check
master  |     return await txaio.as_future(func, *args, **kwargs)
master  |   File "/usr/local/lib/python3.9/site-packages/twisted/internet/defer.py", line 1660, in _inlineCallbacks
master  |     result = current_context.run(gen.send, result)
master  |   File "/usr/local/lib/python3.9/site-packages/crossbar/master/cluster/routercluster.py", line 1176, in add_routercluster_workergroup
master  |     placement_node_oid, placement_node_cnt = nodes.peekitem(0)
master  |   File "/usr/local/lib/python3.9/site-packages/sortedcontainers/sorteddict.py", line 510, in peekitem
master  |     key = self._list[index]
master  |   File "/usr/local/lib/python3.9/site-packages/sortedcontainers/sortedlist.py", line 891, in __getitem__
master  |     raise IndexError('list index out of range')
master  | builtins.IndexError: list index out of range
master  | 
master  | ApplicationError(error=<wamp.error.runtime_error>, args=['list index out of range'], kwargs={}, enc_algo=None, callee=None, callee_authid=None, callee_authrole=None, forward_for=None)
master  | 2022-05-08T12:31:35+0000 [Router         30] Router detached session from realm "default" (session=232698914436104, detached_session_ids=1, authid="superuser", authrole="owner", authmethod="cryptosign", authprovider="dynamic") <crossbar.router.router.Router.detach>
master  | 2022-05-08T12:31:35+0000 [Router         30] <autobahn.twisted.websocket.WebSocketAdapterProtocol.connectionLost> connection lost for peer="tcp4:10.108.0.2:41454", closed cleanly
master exited with code 1
oberstet commented 2 years ago

also, when placements are detected "ok", the log noise should be reduced:

2022-05-09T09:46:49+0000 [Container     685] <crossbar.master.arealm.arealm.ApplicationRealmMonitor._apply_routercluster_placements> Applying router cluster worker group placement:
crossbar_master              | {'changed': 1652088614185236600,
crossbar_master              |  'cluster_oid': '8fd53798-3550-4f10-9689-966ef29ea4d9',
crossbar_master              |  'node_oid': '2458dfb0-ff92-42da-a698-9859e829e723',
crossbar_master              |  'oid': '2b49dc44-b3b4-46bf-9e84-b9c5a4da0a5c',
crossbar_master              |  'status': 'RUNNING',
crossbar_master              |  'tcp_listening_port': 10000,
crossbar_master              |  'worker_group_oid': '62459e38-8fd9-432e-bf1c-962b638e8717',
crossbar_master              |  'worker_name': 'group1_1'}
crossbar_master              | 2022-05-09T09:46:49+0000 [Container     685] <crossbar.master.arealm.arealm.ApplicationRealmMonitor._check_and_apply> check & apply run started for application realm 928fba9a-41cb-4539-86de-735a5d903114 ..
crossbar_master              | 2022-05-09T09:46:49+0000 [Container     685] <crossbar.master.arealm.arealm.ApplicationRealmMonitor._check_and_apply> Applying 1 worker placements for router cluster worker group 62459e38-8fd9-432e-bf1c-962b638e8717, arealm 928fba9a-41cb-4539-86de-735a5d903114
crossbar_master              | 2022-05-09T09:46:49+0000 [Container     685] <crossbar.master.arealm.arealm.ApplicationRealmMonitor._apply_routercluster_placements> Applying router cluster worker group placement:
crossbar_master              | {'changed': 1652088614185236600,
crossbar_master              |  'cluster_oid': '8fd53798-3550-4f10-9689-966ef29ea4d9',
crossbar_master              |  'node_oid': '2458dfb0-ff92-42da-a698-9859e829e723',
crossbar_master              |  'oid': '2b49dc44-b3b4-46bf-9e84-b9c5a4da0a5c',
crossbar_master              |  'status': 'RUNNING',
crossbar_master              |  'tcp_listening_port': 10000,
crossbar_master              |  'worker_group_oid': '62459e38-8fd9-432e-bf1c-962b638e8717',
crossbar_master              |  'worker_name': 'group1_1'}