When there is an unhandled exception inside a task spawn by a BackgroundService task, the error is silently swallowed, leading to obscure, hard-to-debug bugs.
What did you expect instead?
Unhandled exceptions are at least logged or the whole Python process crashes.
Extra information
I discovered this while working at frequenz-floss/frequenz-sdk-python#806. I added a sanity check which failed inside a task that was sending messages to a channel, so no messages was sent, and other task waiting for messages just got stuck, leaving no clues about where the problem might be. This makes the problem really hard to debug.
A way to cope with it in the BackgroundService is to extend it to provide a create_task() method that automatically adds the task to the task list and then also adds a done callback where we can either log the unhandled exception, or just raise a SystemExit exception to exit the program. We could even give the user the option to decide how to handle unhandled exceptions by either passing a callback or letting them override the default callback as a method of the instance. This callback should also remove the task from the tasks list, something users need to do manually at the moment.
Related issues
The solution to this issue needs to have in mind the following related issues:
What happened?
When there is an unhandled exception inside a task spawn by a
BackgroundService
task, the error is silently swallowed, leading to obscure, hard-to-debug bugs.What did you expect instead?
Unhandled exceptions are at least logged or the whole Python process crashes.
Extra information
I discovered this while working at frequenz-floss/frequenz-sdk-python#806. I added a sanity check which failed inside a task that was sending messages to a channel, so no messages was sent, and other task waiting for messages just got stuck, leaving no clues about where the problem might be. This makes the problem really hard to debug.
A way to cope with it in the
BackgroundService
is to extend it to provide acreate_task()
method that automatically adds the task to the task list and then also adds a done callback where we can either log the unhandled exception, or just raise aSystemExit
exception to exit the program. We could even give the user the option to decide how to handle unhandled exceptions by either passing a callback or letting them override the default callback as a method of the instance. This callback should also remove the task from the tasks list, something users need to do manually at the moment.Related issues
The solution to this issue needs to have in mind the following related issues:
8