crossbario / crossbar

Crossbar.io - WAMP application router
https://crossbar.io/
Other
2.05k stars 275 forks source link

callee session killed when caller session lost in flight #2103

Open markope opened 2 months ago

markope commented 2 months ago

I have a crossbar session from client A (browser) and a session from client B (server) both with autobahn-js.

  1. client B has registered an async function myfunc which has about 1 sec runtime.

  2. I call myfunc on client B from client A

  3. I don't await the result in client A and immediately kill the session in client A (by navigating the browser window location = 'https...')

Then client B session is killed completely (reason: null) and an internal error is raised (on_internal_error) The destruction of the session is unexpected and unwanted because this affects all other clients as well that want to use registered functions of B.

Instead client B should throw an error which should be caught and handled in the on_internal_error function that is configured in the connection configuration.

oberstet commented 2 months ago

From your description, I'm not sure what exactly is happening, I'm not even convinced the issue is in CB - it might very well be within ABJS, eg:

CB correctly handling the messaging between Caller/Router/Callee in a situation when an in-flight invocation which is still in-progress Callee-side is signaled (correctly by CB) that the original Caller is now dead, and the (still in-progress) invocation is NOT needed anymore (by the router to forward to the now dead original caller).

Well, and ABJS then not correctly dealing with the error returned from CB.


  1. Providing CB and ABJS (caller and callee) side logs would help.
  2. Having complete WAMP transport-level message traces for both the Caller and the Callee session would be perfect
  3. Having the same problem demonstrated not using ABJS but ABPY for both the Caller and the Callee would provide indirect evidence (still not positive/constructively, but indicating the problem must be in CB - since the same problem was demonstrated by both ABJS and ABPY for client side, so CB must be at fault ..)
oberstet commented 2 months ago

Rgd the CB side of things, what is also crucial for tracking down such situations: are you using a single CB node, or are you using Router-to-Router links and multiple CB nodes?

Essentially, the complete WAMP clients (caller and callee) and WAMP routing (node or rlink with nodes) topology.

The latter obviously makes things an order more complex on the WAMP routing side (assuming now that the problem is indeed CB, single or multi-node, and not WAMP client side in ABJS ...)

markope commented 2 months ago

Yes, could also be ABJS related. I'm using crossbar single node attached to a master node. No router-to-router links.

I extracted some log messages from the node that I associate with the observation:

2024-04-28 12:46:13 2024-04-28T10:46:13+0000 [Proxy         212] dropping connection to peer tcp4:127.0.0.1:36302 with abort=False: None
2024-04-28 12:46:23 2024-04-28T10:46:23+0000 [Proxy         212] dropping connection to peer tcp4:127.0.0.1:41174 with abort=False: None
2024-04-28 12:46:27 2024-04-28T10:46:27+0000 [Proxy         212] <crossbar.worker.proxy.ProxyFrontendSession.onClose> proxy frontend session closed (wasClean=True)
2024-04-28 12:46:27 2024-04-28T10:46:27+0000 [Router        223] <crossbar.router.router.Router.detach> router session detached from realm "realm1" (session=328729321225176, detached_session_ids=1, authid="5", authrole="acct_manager", authmethod="cryptosign-proxy", authprovider="static")
2024-04-28 12:46:27 2024-04-28T10:46:27+0000 [Proxy         212] <crossbar.worker.proxy.ProxyBackendSession.onClose> proxy backend session closed (wasClean=True)
2024-04-28 12:46:27 2024-04-28T10:46:27+0000 [Proxy         212] <crossbar.worker.proxy.ProxyFrontendSession.onClose> proxy frontend session closed (wasClean=True)
2024-04-28 12:46:27 2024-04-28T10:46:27+0000 [Router        223] <crossbar.router.router.Router.detach> router session detached from realm "realm1" (session=5217118060887004, detached_session_ids=1, authid="system", authrole="system", authmethod="cryptosign-proxy", authprovider="static")
2024-04-28 12:46:27 2024-04-28T10:46:27+0000 [Proxy         212] <crossbar.worker.proxy.ProxyBackendSession.onClose> proxy backend session closed (wasClean=True)

(probably not very helpful)

will see if I can make a python example...

oberstet commented 2 months ago

2024-04-28 12:46:13 2024-04-28T10:46:13+0000 [Proxy 212] dropping connection to peer tcp4:127.0.0.1:36302 with abort=False: None

thanks! but unfortunately, the logs do not reveal enough.

what is happening immediately before this line? what messages are received/sent by CB on each of the 2 involved sessions, in particular on session tcp4:127.0.0.1:36302?

you could bump up CB log level, you can bump it up to the point where it will log each and every single WAMP message.

the trick is to reproduce the problem with minimal messages/sessions attached .. otherwise the log volume will pretty much swamp you and finding the needle in the haystack and only just "read" the messages and understand what's going on becomes a challenge;)

markope commented 2 months ago

That first line in the log really is the first line. i.e. any log output before is not related to this issue here, because I trigger the client A disconnect at that time. loglevel=debug is insane 😄 . I was not able to find anything there.

Properly Isolating the issue will take some time...

oberstet commented 2 months ago

loglevel=debug is insane 😄 . I was not able to find anything there.

yes, it is;) if you upload the logs somewhere for me to download, I can have a look. first thing to identify within the logs is the line CB logs when it recognizes the Caller session (the one named "A" in your example) is gone .. and then look subsequently for any hints it does recognize that there still was an ongoing Call ... which did not yet return from the Callee ... and then what happens at this point?

further, what log lines can be observed once the Invocation finally comes back from the Callee, and CB recognizes that the session it wants to forward the result back is no longer there?

Properly Isolating the issue will take some time...

yeah, it can be tricky to figure out what exactly happens. it is an async distributed system with 3 participants. luckily, you don't have to deal with rlinks and router-to-router;)

DZabavchik commented 2 months ago

@markope in order to isolate the issue, can you try without proxies (single "type": "router" worker only, no "type": "proxy" workers). But I doubt it is the proxy. With JS clients it is usually easy to debug looking at the WS frames in browser's developer tools/network tab.

It is peculiar that dropping connection are 10 seconds apart

2024-04-28 12:46:13 2024-04-28T10:46:13+0000 [Proxy         212] dropping connection to peer tcp4:127.0.0.1:36302 with abort=False: None
2024-04-28 12:46:23 2024-04-28T10:46:23+0000 [Proxy         212] dropping connection to peer tcp4:127.0.0.1:41174 with abort=False: None
2