frequenz-floss / frequenz-dispatch-python

High-level interface to dispatch API
https://frequenz-floss.github.io/frequenz-dispatch-python/
MIT License
0 stars 3 forks source link

Clarify how clean works in the actor `_run()` method #17

Open llucax opened 6 months ago

llucax commented 6 months ago

It took me a while to think about the different scenarios to conclude calling cancel only when CancelledError is raised. Just to confirm my thinking is correct, you do this because if any other unexpected exception is raised and _run() is aborted, the auto-restart for actors will kick-in and meanwhile, any task that it is still healthy, will continue working, so hopefully ready dispatches will still be delivered on time, even if the actor wasn't restarted yet?

If this is the case, maybe you should check tasks health when _fetch()ing too, in case some task was aborted, to try to restart it. Right now you are only checking if a task is in the _scheduled dict, but not if the tasks are still running or not.

In any case, it is probably worth adding a comment here clarifying how clean up works exactly on different scenarios.

_Originally posted by @llucax in https://github.com/frequenz-floss/frequenz-dispatch-python/pull/9#discussion_r1539724552_

Marenz commented 6 months ago

I think I was too set on the whole "be have correct on normal termination" mindset that I forgot abnormal termination. So I suppose the proper way would be to cancel all for any unresolvable exception.. which is what I would expect to happen as a user, too

llucax commented 6 months ago

Please bear in mind that the actor will be restarted, so as a user you shouldn't really notice it, maybe only in the logs. As a user you just need to make sure the actor is properly restarted on unexpected errors.