encode / uvicorn

An ASGI web server, for Python. 🦄
https://www.uvicorn.org/
BSD 3-Clause "New" or "Revised" License
8.11k stars 701 forks source link

fix: Avoid CancelledError from being propagated to lifespan's receive() #2367

Open bellini666 opened 2 weeks ago

bellini666 commented 2 weeks ago

Summary

Currently LifeSpanOn.main() is excepting BaseException, which also catches CancelledError.

CancelledError is used by asyncio to signal that a task has been cancelled and is not an error condition.

This PR changes the behavior to catch CancelledError separately and log it as a "Lifespan task cancelled" instead of letting the error propagate.

This is a traceback that I have seen some times on sentry for a project that I work on.

Screenshot 2024-06-13 at 19 08 19

Checklist

Kludex commented 2 weeks ago

What problem are you trying to solve? Can you give me a real example?

bellini666 commented 2 weeks ago

Hey @Kludex,

So, it is not actually a real issue that would cause the application to misbehave or anything like that, but just a noisy error that will appear on your log and sentry (which is my case) but it is actually ok.

It is very rare (occurred 3 times in the past 30 days for me). But looking at the traceback (https://parade-ai.sentry.io/share/issue/4797523b6aa9458c927599ef0abf35a5/) and the logging below, it happened at the same second that uvicorn started, so my theory is that something canceled it during its startup process in a moment that the code was not expecting to be canceled.

Screenshot 2024-06-14 at 11 08 12

While replying here I tried to dig a bit deeper into the traceback and just noticed something that I completely missed yesterday:

Screenshot 2024-06-14 at 11 30 44

starlette is actually the one that is doing something "wrong" in here. For any BaseException it will send either a "lifespan.shutdown.failed" or a "lifespan.startup.failed", which is usually what we want, but for CancelledError that is not the case, it is just that the loop is asking the task to stop and uses that exception as a way to allow the code to except it and do some cleanup work.

Maybe the proper fix would be doing something like this on starlette?

        except asyncio.CancelledError:
            # CancelledError means this task was asked to terminate and it is not actually
            # an error. We want it to behave just like the `else` below, but reraise
            # it to make sure it gets propagated up to the call chain
            await send({"type": "lifespan.shutdown.complete"})
            raise
        except BaseException:
            exc_text = traceback.format_exc()
            if started:
                await send({"type": "lifespan.shutdown.failed", "message": exc_text})
            else:
                await send({"type": "lifespan.startup.failed", "message": exc_text})
            raise
        else:
            await send({"type": "lifespan.shutdown.complete"})

Not totally sure about the await send({"type": "lifespan.shutdown.complete"}) (makes sense to me, but you know more), but not handling it as an error seems to me to be correct (but again, you know more =P)

btw. Nice to have meet you at pycon italia! :)

bellini666 commented 1 week ago

@Kludex let me know if you agree with my conclusion in my previous comment, in which case I'm going to close this and propose a PR at starlette instead :)