crossbario / crossbar

Crossbar.io - WAMP application router
https://crossbar.io/
Other
2.05k stars 274 forks source link

crossbar twisted-python component breaks if a subscription handler raises an exception #833

Open jordal opened 8 years ago

jordal commented 8 years ago

A crossbar twisted-python component breaks if it subscribes to a topic and that subscription handler raises an exception; the component stops serving RPCs and unsubscribes (at least) to the topic for which it raises an exception.

This seems like unwanted/unexpected behavior. If the component provides critical services in a system, even a tiny unforeseen bug could make them all unavailable. I would expect crossbar to simply log a traceback for unhandled exceptions and move on.

Here's my environment:

$ crossbar version
     __  __  __  __  __  __      __     __
    /  `|__)/  \/__`/__`|__) /\ |__)  |/  \
    \__,|  \\__/.__/.__/|__)/~~\|  \. |\__/

 Crossbar.io        : 0.14.0
   Autobahn         : 0.14.1 (with JSON, MessagePack, CBOR, UBJSON)
   Twisted          : 15.4.0-EPollReactor
   LMDB             : 0.89/lmdb-0.9.18
   Python           : 2.7.9/CPython
 OS                 : Linux-3.13.0-83-generic-x86_64-with-debian-8.5
 Machine            : x86_64
$ pip freeze
Jinja2==2.8
MarkupSafe==0.23
PyNaCl==1.0.1
PyTrie==0.2
PyYAML==3.11
Pygments==2.1.3
Twisted==15.4.0
argh==0.26.2
argparse==1.2.1
attrs==16.0.0
autobahn==0.14.1
cbor==1.0.0
cffi==1.7.0
chardet==2.3.0
click==6.6
colorama==0.3.2
crossbar==0.14.0
cryptography==1.4
enum34==1.1.6
html5lib==0.999
idna==2.1
ipaddress==1.0.16
lmdb==0.89
mistune==0.7.3
ndg-httpsclient==0.3.2
netaddr==0.7.18
pathtools==0.1.2
ply==3.4
psutil==4.3.0
py-ubjson==0.8.5
pyOpenSSL==16.0.0
pyasn1==0.1.9
pyasn1-modules==0.0.8
pycparser==2.14
requests==2.4.3
sdnotify==0.3.1
service-identity==16.0.0
setproctitle==1.1.10
shutilwhich==1.1.0
six==1.10.0
treq==15.1.0
txaio==2.5.1
u-msgpack-python==2.1
urllib3==1.9.1
watchdog==0.8.3
wheel==0.24.0
wsgiref==0.1.2
zope.interface==4.2.0

I've attached code to reproduce it. The only interesting files are backend.py, frontend.py, and .crossbar/config.json.

crossbar-unhandled-exception.zip

If you have docker-compose then you can easily run it from the directory with:

$ docker-compose build && docker-compose up
--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/36008164-crossbar-twisted-python-component-breaks-if-a-subscription-handler-raises-an-exception?utm_campaign=plugin&utm_content=tracker%2F462544&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F462544&utm_medium=issues&utm_source=github).
meejah commented 8 years ago

As per https://github.com/crossbario/crossbar/blob/master/crossbar/worker/container.py#L213 it does look like this behavior (killing a component if it raises an exception from a handler) is on purpose -- would be easy to make non-fatal, what does @oberstet think?

oberstet commented 8 years ago

No, this is by design: if you want your component to continue despite the fact that it tried to subscribe and that fails, then catch the exception in your app code and properly treat it:

@defer.inlineCallbacks
def onJoin(self, details):
    try:
        yield self.subscribe(self.on_event, u'test.pubsub')
    except:
        self.log.warn('could not subscribe, but somehow I dont care anyways')

And uncaught exception should be brought to maximum attention, not simply logged.

jordal commented 8 years ago

@oberstet I think we're talking about something different. My concern is not with the subscribe operation itself, but with an exception raised from the handler while processing pubsub events.

If you still disagree then I'll make one more plea: I think a robust server should, if reasonably possible, not enter an impaired state if the application raises an error while processing a pubsub event. There's no reason to believe that such an error will recur on that pubsub topic and that it's fatal enough to justify aborting the application.

This is loosely analogous to most web frameworks that trap and log unhandled application errors and turn them into 500 status codes, but don't silently stop serving requests to the application thereafter.

Which raises another question: as it currently behaves, the crossbar server doesn't abort---the application component just silently stops working. To bring "maximum attention", wouldn't it be better to abort crossbar rather than keep running with a disabled component? (honest question, not meant to be snarky)

I wrote up a quick decorator that applies the behavior to subscribe event handlers. I'm posting it here in case anyone else finds it useful. This is, of course, no excuse to not catch and handle internal exceptions in your code whenever you can.

def trap_and_log(wrapped):
    @functools.wraps(wrapped)
    def _wrapper(*args, **kargs):
        def print_traceback(failure):
            log.error(
                "Unhandled exception caught: {f}",
                f=txaio.failure_format_traceback(failure))

        try:
            res = wrapped(*args, **kargs)
        except Exception:
            # If the function outright raised an exception, then just log it.
            failure = txaio.create_failure()
            print_traceback(failure)
        else:
            # If the function returned a Deferred, attach an errback to it.
            # Otherwise, return its result.
            if isinstance(res, defer.Deferred):
                return res.addErrback(print_traceback)
            else:
                return res
    return _wrapper
meejah commented 8 years ago

Personally, I think it's better to log and keep going -- but perhaps this should be an option? That is, by default we log and keep going (I think this is "least surprising"), but we could provide an option that makes an exception from a handler component-fatal (or as @jordal suggested, crossbar-fatal).

These latter could be useful especially in development -- or even in production depending on the setup of the system (e.g. if it can recover on restart, and you've got something monitoring the crossbar process itself).

Instead of a config-option, we could provide a wamp.* notification of some sort, and "something else" could decide what to do (e.g. notify some web service, shut down the component, shut down crossbar, etc) and the "log and keep going" would then just always happen and if you want some different behavior you write something to respond to the notification.