frequenz-floss / frequenz-core-python

Core utilities to complement Python's standard library
https://frequenz-floss.github.io/frequenz-core-python/
MIT License
0 stars 2 forks source link

Add a safety mechanism to `BackgroundService` to notify about crashed tasks #9

Closed llucax closed 3 months ago

llucax commented 5 months ago

What happened?

When there is an unhandled exception inside a task spawn by a BackgroundService task, the error is silently swallowed, leading to obscure, hard-to-debug bugs.

What did you expect instead?

Unhandled exceptions are at least logged or the whole Python process crashes.

Extra information

I discovered this while working at frequenz-floss/frequenz-sdk-python#806. I added a sanity check which failed inside a task that was sending messages to a channel, so no messages was sent, and other task waiting for messages just got stuck, leaving no clues about where the problem might be. This makes the problem really hard to debug.

A way to cope with it in the BackgroundService is to extend it to provide a create_task() method that automatically adds the task to the task list and then also adds a done callback where we can either log the unhandled exception, or just raise a SystemExit exception to exit the program. We could even give the user the option to decide how to handle unhandled exceptions by either passing a callback or letting them override the default callback as a method of the instance. This callback should also remove the task from the tasks list, something users need to do manually at the moment.

Related issues

The solution to this issue needs to have in mind the following related issues:

llucax commented 3 months ago

I will close this, now we have PersistentTaskGroup that will log exceptions and offer users an easy way to check for failed tasks.