adap / flower

Flower: A Friendly Federated AI Framework
https://flower.ai
Apache License 2.0
5.1k stars 879 forks source link

Why does flwr override the signal handlers in flwr.client.app? #3837

Open scarere opened 3 months ago

scarere commented 3 months ago

What is your question?

I noticed flwr 1.9.0 overrides the SIGINT and SIGTERM signal handlers in flwr.client.app (this was not present in 1.7.0). The custom defined signal handler just raises a StopIteration error every time. This was causing errors in my application where the client spawns some child processes for dataloading. When the client is shutdown, these child processes are terminated and the StopIteration error is raised. As I understand it, SIGTERM is the polite way to close a process, so I don't know why an error is needed. I've currently worked around this by temporarily resetting the signal handlers when the child processes are spawned but am curious to know if there is a better solution and what the reasoning behind this change was.

adam-narozniak commented 3 months ago

@charlesbvll @Robert-Steiner is this related to the stopping problems that you have been working on? Maybe you know more context for that?

charlesbvll commented 3 months ago

@scarere Thank you for your feedback! This is very useful to us. We wanted to find a way that allows clients to exit gracefully by handling signals. We didn't realise that this approach could have bad consequences for child processes. We will definitely take this into account in our implementation. I think @panh99 is already working on a better approach for handling signals, which will hopefully fix this issue!