crossbario / autobahn-python

WebSocket and WAMP in Python for Twisted and asyncio
https://crossbar.io/autobahn
MIT License
2.48k stars 766 forks source link

Exceptions thrown in Future callbacks are not handled #485

Open sehoffmann opened 9 years ago

sehoffmann commented 9 years ago

They are neither logged nor does the caller gets feedback if e.g an INVOCATION message was sent. This leads to situations where caller simply hang up if they haven't any timeout specified without any feedback to why this occurs. See https://github.com/tavendo/AutobahnPython/issues/484 for an example of such a situation.

oberstet commented 9 years ago

sure. attach a errback handler to the future ..

sehoffmann commented 9 years ago

Shouldn't it be the responsibility of the library to make sure that no exception is unhandled but atleast logged and any stateful operations with a peer are closed accordingly? I'm not talking about exceptions raised in user code but in autobahns handlers/callbacks itself. While I think that we can all agree on the fact that such exceptions shouldn't be raised in the first place, #484 shows that it can happen anyways.

Besides that, a user can't just attach an errback handler because 1) if @inlineCallbacks or any similiar decorator is used, the user has no access to the returned future 2) the user can only attach an errback handler BEFORE autobahn attaches his, making it completely pointless to attach one in the first place.

Edit: To clarify why this is important. The symptomatics of #484 looked like this in my case: There was an ingoing INVOCATION message logged but not RESULT message. The other peer just endlessly waited for the call to finish and thats it. Nothing more. The callee/code causing this has correctly worked in the past. Adding a simple log for such situations would extremly help to find and fix such errors in the future.

I accidentally wrote 'callee' in the original post, that was ofc wrong and I meant 'caller'.

oberstet commented 9 years ago

Exceptions raised in user code (e.g. from a registered user procedure or from a session lifecycle callback) now go through http://autobahn.ws/python/reference/autobahn.wamp.html#autobahn.wamp.protocol.ApplicationSession.onUserError, and the default implementation we provide for this will log it https://github.com/tavendo/AutobahnPython/blob/master/autobahn/twisted/wamp.py#L67

That means, you can override and handle yourself, or, if you don't do that, it'll get logged.

Exceptions raised inside the library, which are not handled in any way (neither acted upon, or logged) are bugs in the library.

484 is just a "normal" bug in the library - it isn't missing an error handler we should have attached, but we need to properly guard to cover over the py2/3/tx/cornercases mess.

consider this:

from __future__ import print_function
from twisted.internet.defer import Deferred, inlineCallbacks, returnValue
from twisted.internet import reactor

# this doesn't help: it'll just print more detail
# _if_ and _when_ an error is handled
#from twisted.internet.defer import setDebugging
#setDebugging(True)

@inlineCallbacks
def foo():
    returnValue(1/0)

@inlineCallbacks
def bar():
    res = yield foo()
    returnValue(res*2)

d = bar()

# this doesn't handle the error. it'll be swallowed, and
# twisted will only print the issue when the reactor stops
d.addCallback(print)

# this _is_ handling the error
#d.addCallbacks(print, print)

reactor.callLater(3, reactor.stop)
reactor.run()

1) if @inlineCallbacks or any similiar decorator is used, the user has no access to the returned future

sure. use a try/except block.

# this is also handling the error
@inlineCallbacks
def doit():
    try:
        res = yield bar()
    except Exception as e:
        print(e)

I am nevertheless reopening this issue: what's your opinion @hawkowl and @meejah ?

oberstet commented 9 years ago

the Twisted docs have this to say:

The reason Deferred can't automatically show your errors is because a Deferred can still have callbacks and errbacks added to it even after a result is available -- so we have no reasonable place to put a logging call that wouldn't result in spurious tracebacks that are handled later on. When a deferred is garbage collected with a Failure, a traceback is automatically printed. However, if you keep a reference to your Deferred and it is never garbage collected, it's possible no traceback will be printed.

Which also explains why unhandled errors from deferreds are printed when the reactor stops (as everything is going to be collected then).

meejah commented 9 years ago

Just a meta-point about @inlineCallbacks: the function it decorates does just return a Deferred, that you can add errbacks to as normal.

meejah commented 9 years ago

As to "internal"-type errors, I think they should be logged (e.g. log.failure, with traceback) and bubbled up -- if they "shouldn't happen" then the caller of the method must receive an error.

There may still be some places where errors are only logged at "debug" level -- which should be changed to critical/failure IMO. In general, this sort of logging has been improved a lot recently... If this issue is one of those, though, we should make a test-case and fix it.

So: if an INVOCATION fails because the invoked method itself threw an error, we should just throw an error back into the caller (without logging ourselves). If INVOCATION fails because of some internal library error, there should be a stack-trace logged (and the error should get thrown back to the caller).

oberstet commented 9 years ago

@meejah I agree. What you describe is how we ideally should handle the 2 cases (invoked procedure raises, vs. internal library error).

If INVOCATION fails because of some internal library error, there should be a stack-trace logged (and the error should get thrown back to the caller).

"some internal library error" .. happening anywhere? If the error is e.g. in the code path that sends the WAMP error message back to the router, the message isn't sent. We can't escape that. Some internal errors will be fatal - no message goes out.


Currently, when a registered procedure throws an error, we both log (via the default implementation of onUserError) and return a WAMP ERROR message to the router https://github.com/tavendo/AutobahnPython/blob/master/autobahn/wamp/protocol.py#L803

The problem in #484 is this line: errmsg = "{0}".format(err.value.args[0])

It assumes err.value would always have an args being a tuple/list of non-zero length. And that fails badly, because as it turns out, this isn't always the case.

This is what I ment with the first paragraph above: it's a library bug in the code path that creates/sends the WAMP error message.

And it isn't logged, because the bug is happening already in an errback.

sehoffmann commented 9 years ago

Just to clarify this again because you put considerable effort into that aspect: I am not concerned about how error handling of exceptions in user code is currently done.

Exceptions raised inside the library, which are not handled in any way (neither acted upon, or logged) are bugs in the library.

This is exactly my particular concern. Yeah those bugs shouldn't happen in the first place but they happen anyways. The least thing to do is to log these currently uncaught and unlogged exceptions.

I wasn't aware of the fact that twisted logs uncaught/unhandled exception if the reactor stops. Although I still believe that logging is neccessary as that behaviour of twisted can easily be overlooked.

Twisted supports chaining of errbacks (im not exactly sure how asyncio handles this though). One just needs a simple utility function that adds a second errback handler which performs the desired logging to a future after the actual callbacks have already beend added.

oberstet commented 9 years ago

twisted logs uncaught/unhandled exception if the reactor stops

To be super precise: Twisted logs unhandled errors that accumulated inside a Deferred when the Deferred is garbage collected. And that happens at the latest when the reactor stops. It may happen earlier. And when exactly an object is collected is under control of the Python implementation. E.g. CPython and PyPy differ on that.

Twisted supports chaining of errbacks (im not exactly sure how asyncio handles this though).

asyncio Futures don't have chaining http://autobahn.ws/python/asynchronous-programming.html#asyncio-futures-and-coroutines

sehoffmann commented 9 years ago

Well in that case we could write a simple decorator which just wraps the original errback in a try-block. We could apply that decorator manually each time or maybe write a utility function:

_add_callbacks(future, cb, errback):
     txaio.add_callbacks(future, cb, log_exceptions(errback))
meejah commented 9 years ago

There is already a helper called _swallow_error in some place; this logs the error and cancels it (by returning None). Usually this is added to any Deferreds/Futures from "user" callbacks (e.g. when we're calling onJoin etc) but there may be other cases where it's appropriate. As @oberstet says, sometimes this also fails :( or there may still be cases where we've ignored a Deferred instead of adding a logging error-handler.

BTW, I think just txaio.add_callbacks(future, None, logger.failure) should be enough with a (new) Twisted logger...

meejah commented 9 years ago

As far as this particular bug is concerned, we were getting an exception while trying to log an error. Outside of pernicious issues like this, I think we're properly logging internal failures as well as possible. So, I believe this bug can be closed.

@Paranaix is this correct?

sehoffmann commented 9 years ago

@meejah No this is not correct. That error happens in (internal/autobahn) code responsible for sending back an ERROR message (application logic, not logging). To state it differently: At some places (namely callbacks added by autobahn) exceptions caused by erroneous internal/autobahn code are not logged.

meejah commented 9 years ago

Ah, okay, so very-probably code like this: https://github.com/tavendo/AutobahnPython/blob/master/autobahn/wamp/websocket.py#L106 needs to log an "internal error" instead (as in: we tried to handle an incoming message, but "something" unexpected went wrong so it should be logged so a proper bug can be filed).

I'll make a test-case that simulates your particular problem to make sure this stays-fixed.