crossbario / autobahn-python

WebSocket and WAMP in Python for Twisted and asyncio
https://crossbar.io/autobahn
MIT License
2.48k stars 766 forks source link

ApplicationRunner doesn't shut down properly if connection fails before handshake completes #559

Open robertobarreda opened 8 years ago

robertobarreda commented 8 years ago

I have a crossbar.io instance running behind an nginx with the current config:

   upstream websocket {
      server 127.0.0.1:8080;
   }
   map $http_upgrade $connection_upgrade {
     default upgrade;
     ''      close;
   }
   server {
   ...
    location = /ws {
        # switch off logging
        access_log off;
        auth_basic off;

        # redirect all HTTP traffic to localhost:8080
        proxy_pass http://websocket;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Scheme $scheme;
        proxy_set_header Host localhost:8080;

        # WebSocket support (nginx 1.4)
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection $connection_upgrade;
    }
}

Taking as example https://github.com/crossbario/autobahn-python/blob/master/examples/twisted/wamp/app/hello/hello.py with small modifications (changing the url port)

If crossbar.io is running before the application is started, everything works perfectly.

But if crossbar is down the moment the app begins I got the following response and the application never ends (the reactor is not stopped even if I add the onDisconnect event)

2015-12-01 16:38:16+0000 [-] Log opened.
2015-12-01 16:38:16+0000 [-] Starting factory <autobahn.twisted.websocket.WampWebSocketClientFactory object at 0x7f490d3f3110>
2015-12-01 16:38:16+0000 [-]
    [('debug', True, 'WampWebSocketClientFactory'),
     ('debugCodePaths', False, 'WampWebSocketClientFactory'),
     ('logOctets', True, 'WampWebSocketClientFactory'),
     ('logFrames', True, 'WampWebSocketClientFactory'),
     ('trackTimings', False, 'WampWebSocketClientFactory'),
     ('allowHixie76', False, 'WampWebSocketClientFactory'),
     ('utf8validateIncoming', True, 'WampWebSocketClientFactory'),
     ('applyMask', True, 'WampWebSocketClientFactory'),
     ('maxFramePayloadSize', 0, 'WampWebSocketClientFactory'),
     ('maxMessagePayloadSize', 0, 'WampWebSocketClientFactory'),
     ('autoFragmentSize', 0, 'WampWebSocketClientFactory'),
     ('failByDrop', True, 'WampWebSocketClientFactory'),
     ('echoCloseCodeReason', False, 'WampWebSocketClientFactory'),
     ('openHandshakeTimeout', 5, 'WampWebSocketClientFactory'),
     ('closeHandshakeTimeout', 1, 'WampWebSocketClientFactory'),
     ('tcpNoDelay', True, 'WampWebSocketClientFactory'),
     ('autoPingInterval', 0, 'WampWebSocketClientFactory'),
     ('autoPingTimeout', 0, 'WampWebSocketClientFactory'),
     ('autoPingSize', 4, 'WampWebSocketClientFactory'),
     ('version', 18, 'WampWebSocketClientFactory'),
     ('acceptMaskedServerFrames', False, 'WampWebSocketClientFactory'),
     ('maskClientFrames', True, 'WampWebSocketClientFactory'),
     ('serverConnectionDropTimeout', 1, 'WampWebSocketClientFactory'),
     ('perMessageCompressionOffers', [], 'WampWebSocketClientFactory'),
     ('perMessageCompressionAccept',
      <function <lambda> at 0x7f490e2e9758>,
      'WampWebSocketClientFactory')]
2015-12-01 16:38:16+0000 [-] connection to tcp4:127.0.0.1:80 established
2015-12-01 16:38:16+0000 [-] TX Octets to tcp4:127.0.0.1:80 : sync = False, octets = 474554202f777320485454502f312e310d0a557365722d4167656e743a204175746f6261686e507974686f6e2f302e31302e390d0a486f73743a203132372e302e302e313a38300d0a557067726164653a20576562536f636b65740d0a436f6e6e656374696f6e3a20557067726164650d0a507261676d613a206e6f2d63616368650d0a43616368652d436f6e74726f6c3a206e6f2d63616368650d0a5365632d576562536f636b65742d4b65793a206d303477794f74706d307466774759564773567874513d3d0d0a5365632d576562536f636b65742d50726f746f636f6c3a2077616d702e322e6a736f6e2e626174636865642c77616d702e322e6a736f6e0d0a5365632d576562536f636b65742d56657273696f6e3a2031330d0a0d0a
2015-12-01 16:38:16+0000 [-] GET /ws HTTP/1.1
    User-Agent: AutobahnPython/0.10.9
    Host: 127.0.0.1:80
    Upgrade: WebSocket
    Connection: Upgrade
    Pragma: no-cache
    Cache-Control: no-cache
    Sec-WebSocket-Key: m04wyOtpm0tfwGYVGsVxtQ==
    Sec-WebSocket-Protocol: wamp.2.json.batched,wamp.2.json
    Sec-WebSocket-Version: 13

2015-12-01 16:38:16+0000 [-] RX Octets from tcp4:127.0.0.1:80 : octets = 485454502f312e31203530322042616420476174657761790d0a5365727665723a206e67696e782f312e362e320d0a446174653a205475652c2030312044656320323031352031363a33383a313620474d540d0a436f6e74656e742d547970653a20746578742f68746d6c0d0a436f6e74656e742d4c656e6774683a203137320d0a436f6e6e656374696f6e3a206b6565702d616c6976650d0a0d0a3c68746d6c3e0d0a3c686561643e3c7469746c653e3530322042616420476174657761793c2f7469746c653e3c2f686561643e0d0a3c626f6479206267636f6c6f723d227768697465223e0d0a3c63656e7465723e3c68313e3530322042616420476174657761793c2f68313e3c2f63656e7465723e0d0a3c68723e3c63656e7465723e6e67696e782f312e362e323c2f63656e7465723e0d0a3c2f626f64793e0d0a3c2f68746d6c3e0d0a
2015-12-01 16:38:16+0000 [-] received HTTP response:

    HTTP/1.1 502 Bad Gateway
    Server: nginx/1.6.2
    Date: Tue, 01 Dec 2015 16:38:16 GMT
    Content-Type: text/html
    Content-Length: 172
    Connection: keep-alive

2015-12-01 16:38:16+0000 [-] received HTTP status line in opening handshake : HTTP/1.1 502 Bad Gateway
2015-12-01 16:38:16+0000 [-] Unable to format event {'log_namespace': 'autobahn.twisted.websocket.WampWebSocketClientFactory', 'log_level': <LogLevel=debug>, 'format': '%(log_legacy)s', 'time': 1448987896.668816, 'log_source': <autobahn.twisted.websocket.WampWebSocketClientFactory object at 0x7f490d3f3110>, 'log_format': "received HTTP headers in opening handshake : {u'date': u'Tue, 01 Dec 2015 16:38:16 GMT', u'content-length': u'172', u'content-type': u'text/html', u'connection': u'keep-alive', u'server': u'nginx/1.6.2'}", 'system': '-', 'log_logger': <Logger 'autobahn.twisted.websocket.WampWebSocketClientFactory'>, 'message': (), 'log_time': 1448987896.668816}: u"u'date'"
2015-12-01 16:38:16+0000 [-] failing WebSocket opening handshake ('WebSocket connection upgrade failed (502 - BadGateway)')
2015-12-01 16:38:16+0000 [-] Connection to/from tcp4:127.0.0.1:80 was aborted locally
2015-12-01 16:38:16+0000 [-] Stopping factory <autobahn.twisted.websocket.WampWebSocketClientFactory object at 0x7f490d3f3110>
meejah commented 8 years ago

Does it behave the same when it's not behind nginx? It could be that nginx is keeping the connection open during (reverse) proxying...?

By "add onDisconnect" I guess you mean you're stopping the event-loop in your onDisconnect handler...?

robertobarreda commented 8 years ago

Exactly @meejah

    def onDisconnect(self):
        print("Transport disconnected.")
        if reactor.running:
            reactor.stop()

If the application connects directly to crossbar a twisted.internet.error.ConnectionRefusedError is raised and the main loop terminates

2015-12-02 10:48:23+0000 [-] Log opened.
2015-12-02 10:48:23+0000 [-] Starting factory <autobahn.twisted.websocket.WampWebSocketClientFactory object at 0x7fb5bed1d250>
2015-12-02 10:48:23+0000 [-] Stopping factory <autobahn.twisted.websocket.WampWebSocketClientFactory object at 0x7fb5bed1d250>
2015-12-02 10:48:23+0000 [-] Main loop terminated.
2015-12-02 10:48:23+0000 [-] Traceback (most recent call last):
2015-12-02 10:48:23+0000 [-]   File "app.py", line 39, in <module>
2015-12-02 10:48:23+0000 [-]     runner.run(Component)
2015-12-02 10:48:23+0000 [-]   File "/usr/local/lib/python2.7/dist-packages/autobahn/twisted/wamp.py", line 261, in run
2015-12-02 10:48:23+0000 [-]     raise connect_error.exception
2015-12-02 10:48:23+0000 [-] twisted.internet.error.ConnectionRefusedError: Connection was refused by other side: 111: Connection refused.

If the application connects to nginx the TCP connection IS successful. The problem is the response that brakes the handshake but doesn't rise the exception to end the reactor loop

    HTTP/1.1 502 Bad Gateway
    Server: nginx/1.6.2
    Date: Wed, 02 Dec 2015 10:48:58 GMT
    Content-Type: text/html
    Content-Length: 172
    Connection: keep-alive
meejah commented 8 years ago

Thanks for the update. Can you use netstat to see if the connection to nginx is still alive? That is, I'm curious if in the "behind nginx" case you're not getting the onDisconnect because the connection is in fact still alive -- even though crossbar is not.

I should have some time to try myself later on today.

robertobarreda commented 8 years ago

Yes, that's my problem. The connection with nginx is alive (you can see the nginx responds with an 502 Bad Gateway) but Crossbar.io is not

oberstet commented 8 years ago

Are both nginx and crossbar on the same LAN? background: detecting a broken TCP connection has usually large timeouts - unless there is a ping/pong mechanism and timeouts.

robertobarreda commented 8 years ago

@oberstet both in the same machine localhost:80 -> NGINX localhost:8080 -> crossbar.io

meejah commented 8 years ago

@robertobarreda I meant: after you get the 502, does netstat show you a live TCP connection? If that's the case (and maybe that's what you meant ;) then I'm not sure what Autobahn can do. Perhaps there is an nginx option to not keep connections open (persumably its doing this for potential pipelining), at least for the websocket relaying...?

@oberstet I'm not familiar enough with the websocket standard to know: can we just look for the 502 and nuke the connection?

robertobarreda commented 8 years ago

@meejah @oberstet

Just running another simple example where nginx has the default config, so doesn't know about the /ws path:

$ DEBUG=1 ROUTER="ws://localhost/ws" app.py
2015-12-03 17:33:36+0000 [-] Log opened.
2015-12-03 17:33:36+0000 [-] Starting factory <autobahn.twisted.websocket.WampWebSocketClientFactory object at 0x7f9c2cd47450>
2015-12-03 17:33:36+0000 [-] failing WebSocket opening handshake ('WebSocket connection upgrade failed (404 - NotFound)')
2015-12-03 17:33:36+0000 [-] Connection to/from tcp4:127.0.0.1:80 was aborted locally
2015-12-03 17:33:36+0000 [-] Stopping factory <autobahn.twisted.websocket.WampWebSocketClientFactory object at 0x7f9c2cd47450>
$ sudo netstat -putan | grep 80
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      28717/nginx -g daem
tcp6       0      0 :::80                   :::*                    LISTEN      28717/nginx -g daem

The TCP connection is actually closed, but the app doesn't know how to handle a bad response when was waiting for a handshake message.

Maybe raise an exception to be able to raise the onDisconnect and stop the reactor if the handshake message is wrong?

meejah commented 8 years ago

Okay, this is actually a problem with ApplicationRunner. Essentially what's happening is that this isn't counted as connecting -- and so you'd never see an onDisconnect called in the client because when autobahn/wamp/websocket.py:69 onClose is called, _session is still None.

To fix, ApplicationRunner needs to "listen" for onClose on the protocol it gets back after calling connect() and shut down the reactor then.

@robertobarreda can you work around this for now by having crossbar start your process for you, by adding it as a container to the config etc.?

robertobarreda commented 8 years ago

@meejah is there a way I can have a custom ApplicationRunner that subscribes to this protocol onClose event? I'm using autobahn-python v0.10.9

meejah commented 8 years ago

This is actually a wider issue than just ApplicationRunner as there is really no callback to hang onto for any use-case where many parts of the handshake fail. Conceptually, part of the statemachine looks like this:

tcp connection (successful) -> attempt handshake -> handshake success (create session, call onConnect())

So if anything goes wrong on the TCP connection itself, that bubbles out as an errback/exception to the original .connect() call (looking at the Twisted case). So, that's fine. But, if the tcp connection works (e.g. you're connecting to a "normal" webserver) but the handshake fails (because it doesn't speak websocket at all, for example, or the TLS fails) then there's no errback (in fact, the Deferred from the connect() has allready callback()'d anyway) and there isn't even an ApplicationSession object created yet, so we can't even call onDisconnect on it ...

I'm adding LTS API as we should ensure these use-cases work properly with that.I believe the Connection API we'd discussed/prototyped in the new-api discussions would be able to handle this cleanly, as any failure at all before getting to 'handshake success' would/could just be an errback.

jvdm commented 8 years ago

For the sake of sharing here's a monkey patch I've been using as a workaround (as I wait for a proper fix):

# Usual ApplicationRunner setup blah blah blah.
runner_d = ApplicationRunner(...)

def connect_success(proto):
    orig_on_close = proto.onClose

    def fake_on_close(*args, **kwds):
        if proto._session is None:
            # Errback, log, stop the reactor etc.
            print('Ouch, connection lost before wamp handshake.')
            reactor.stop()
        else:
            orig_on_close(proto, *args, **kwds)

    proto.onClose = fake_on_close

runner_d.addCallback(connect_success)
jdemaeyer commented 8 years ago

@jvdm thanks for the monkey patch!

runner_d = ApplicationRunner(...)

should probably be

runner_d = ApplicationRunner(...).run()

right?

Edit: And

orig_on_close(proto, *args, **kwds)

should be

orig_on_close(*args, **kwds)

right? (I get builtins.TypeError: onClose() takes 4 positional arguments but 5 were given on normal closes [i.e. after a successful connect] otherwise)

lenzenmi commented 8 years ago

Same problem with asyncio.

If I re-write autobahn.asyncio.wamp.ApplicationRunner and add just a few lines as follows, at least the process exits

def run(self, make):
    ...
    (transport, protocol) = loop.run_until_complete(coro)
        def connection_lost(exc):
            print('Connection Lost!')
            loop.stop()
            loop.close()
            exit(1)

    protocol.connection_lost = connection_lost
    ...

Seems the problem is in WebSocketProtocol._connectionLost. The loop isn't being stopped.