crossbario / crossbar

Crossbar.io - WAMP application router
https://crossbar.io/
Other
2.05k stars 274 forks source link

Managed node becomes offline while check/apply is running #1993

Open oberstet opened 2 years ago

oberstet commented 2 years ago

When a managed node is checked for resources supposed to be running (because of router-/workercluster and arealm resources), the managed node might become offline in the middle of running a check:

2022-04-12T10:25:08+0200 [Container   39583] <crossbar.master.arealm.arealm.ApplicationRealmMonitor._check_and_apply> check & apply run started for application realm 1797641b-4e4e-48cb-9259-4cc5d2bf8ba0 ..
2022-04-12T10:25:08+0200 [Container   39583] <crossbar.master.arealm.arealm.ApplicationRealmMonitor._check_and_apply> Applying 1 worker placements for router cluster worker group 5eb10d98-be18-4eaf-b399-150c247a50b9, arealm 1797641b-4e4e-48cb-9259-4cc5d2bf8ba0
2022-04-12T10:25:08+0200 [Container   39583] <crossbar.master.arealm.arealm.ApplicationRealmMonitor._apply_routercluster_placements> Applying router cluster worker group placement:
{'changed': 1649751767644937278,
 'cluster_oid': '50b56ba8-ebfb-4bc4-bd79-987cff959687',
 'node_oid': '0e12a118-3666-4be6-83a0-7aa06046d4ad',
 'oid': '15b66735-bb88-4739-b1af-0005f4a4398a',
 'status': 'RUNNING',
 'tcp_listening_port': 10000,
 'worker_group_oid': '5eb10d98-be18-4eaf-b399-150c247a50b9',
 'worker_name': 'group1_1'}
2022-04-12T10:25:08+0200 [Router      39562] Router detached session from realm "default" (session=2057528462130026, detached_session_ids=1, authid="core3", authrole="node", authmethod="cryptosign", authprovider="dynamic") <crossbar.router.router.Router.detach>
2022-04-12T10:25:08+0200 [Container   39583] Warning: managed node "core3" became offline [oid=bcf38602-c26a-4b2d-8d6f-6c066ed27903, session=2057528462130026, status=offline] <crossbar.master.mrealm.controller.MrealmController._on_session_shutdown>
2022-04-12T10:25:08+0200 [Container   39583] <crossbar.master.arealm.arealm.ApplicationRealmMonitor._apply_routercluster_placements> Rlink other node worker is on node 0e12a118-3666-4be6-83a0-7aa06046d4ad, worker group1_1, cluster_ip core1:
{'authextra': {'cluster_ip': 'core1',
               'mrealm_oid': '507aa077-30d0-4cef-ac27-c91257f1b6e6',
               'node_oid': '0e12a118-3666-4be6-83a0-7aa06046d4ad'},
 'authid': 'core1',
 'cluster_ip': 'core1',
 'description': None,
 'label': None,
 'mrealm_oid': '507aa077-30d0-4cef-ac27-c91257f1b6e6',
 'oid': '0e12a118-3666-4be6-83a0-7aa06046d4ad',
 'owner_oid': '38f25dbd-3299-4ace-9aba-1df4c5fa64c3',
 'pubkey': 'abd3f0667ebc4e5e640188768ea26f6a08d198654bfc179e516ec518cd671e63',
 'tags': None}
2022-04-12T10:25:08+0200 [Container   39583] Unhandled error in Deferred:
2022-04-12T10:25:08+0200 [Container   39583] 
Traceback (most recent call last):
--- <exception caught here> ---
  File "/home/oberstet/scm/crossbario/crossbar/crossbar/master/arealm/arealm.py", line 233, in _check_and_apply
    success = yield self._apply_webcluster_connections(wc_node_oid, wc_worker_id, workergroup_placements,
  File "/home/oberstet/scm/crossbario/crossbar/crossbar/master/arealm/arealm.py", line 306, in _apply_webcluster_connections
    connection = yield self._manager._session.call(
autobahn.wamp.exception.ApplicationError: ApplicationError(error=<wamp.error.no_such_procedure>, args=['no callee registered for procedure <crossbarfabriccenter.node.core3.worker.cpw-5291e12d-0.get_proxy_connection>'], kwargs={}, enc_algo=None, callee=None, callee_authid=None, callee_authrole=None, forward_for=None)