crossbario / crossbar

Crossbar.io - WAMP application router
https://crossbar.io/
Other
2.05k stars 274 forks source link

Missing wamp.session.on_join messages #1348

Open jellisgwn opened 6 years ago

jellisgwn commented 6 years ago

In a crossbar instance, with ~150 connected websocket sessions, which is configured to forward on_join and on_leave messages to a URL with the following component:

    "type": "class",
    "classname": "crossbar.adapter.rest.MessageForwarder",
    "realm": "realm1",
    "extra": {
        "subscriptions": [
        {
            "url": "http://localhost:8080/wamp/session_joined",
            "topic": "wamp.session.on_join",
            "debug": true
        },
        {
            "url": "http://localhost:8080/wamp/session_left",
            "topic": "wamp.session.on_leave",
            "debug": true
        }
        ],
        "expectedCode": 200,
        "method": "POST",
        "debug": true

we are not receiving a call for every instance of

2018-06-07T08:02:16-0400 [Router      17222] session "8819229878476836" joined realm "realm1"
2018-06-07T08:02:18-0400 [Router      17222] session "6398546600367471" joined realm "realm1"

that is seen in the first few seconds after startup. If there are 150 instances of the above message logged, we receive maybe 10 or 15 calls to the URL.

Subsequently a called wamp.session.list will return all 150 sessions, and even more mysteriously we continue to receive calls for sessions leaving and joining when startup has finished.

Could this be a timing issue with the registering of components during startup... but i'm struggling to prove anything.

Edit: Version: Crossbar.io COMMUNITY 17.9.2

jellisgwn commented 6 years ago

Follow-up on reproducing this issue.

The environment in which this problem is being seen is:

If crossbar is restarted, the clients disconnect and attempt to reconnect. This happens successfully, but we don't receive wamp.session.on_join messages for all clients.

Generic reproduce:

FWIW, i'm attempting to use autobahn java, and move away from relying on the REST bridge. However, that seems less likely to provide a solution as there is no way of knowing if the server will manage to subscribe before any of the clients have reconnected and the on_join message has been published to the topic.

oddjobz commented 6 years ago

Can I ask a little about the receiving URL .. how many concurrent connections is it set up to handle, and how much processing is it doing, i.e. how long does a single call take to complete?

jellisgwn commented 6 years ago

Sure. 25 concurrent requests, and additional requests are queued. An individual request is processed in <20ms.

If you're asking if the requests are being dropped, the answer is no. crossbar is behind nginx, and the logs show no sign of the requests being received and not serviced.

oberstet commented 6 years ago

you could try with patched code in the http bridge for using a dedicated larged thread pool for outgoing requests, as in

=> from treq._utils import default_pool


def _client(*args, **kwargs):
    agent = kwargs.get('agent')
    if agent is None:
        reactor = default_reactor(kwargs.get('reactor'))
        pool = default_pool(reactor,
                            kwargs.get('pool'),
                            kwargs.get('persistent'))
        agent = Agent(reactor, pool=pool)
    return HTTPClient(agent)
jellisgwn commented 6 years ago

@oberstet ok. i can try to work through that... not entirely sure what you're suggesting that i change, but maybe it'll be obvious when looking at the code.

The issue is in production, and it won't be easy (maybe impossible) for me to run experiments.

meejah commented 6 years ago

@jellisgwn I am fairly confident this is confined to the "REST bridge", because WAMP sessions are connected to transports if they're TCP -- that is, if the transport goes away, the session definitely does too.

Obviously, having an example the repeats the problem reliably is ideal..

jellisgwn commented 6 years ago

@meejah i agree with that analysis - when querying the session_list all the sessions are present, including ones for which no notification has been received over the REST bridge.

This seems to suggest two possibilities: