apple / pfl-research

Simulation framework for accelerating research in Private Federated Learning
http://apple.github.io/pfl-research/
Apache License 2.0
281 stars 27 forks source link

Track per-client metrics over time #91

Open grananqvist opened 1 month ago

grananqvist commented 1 month ago

tracking the metrics of each user over time over the course of training the global model can be very useful for distribution of metrics, monitoring outlier users and debugging. The user metric should be measured at a central iteration only if it actually was sampled of course. We can have a post processor (https://apple.github.io/pfl-research/reference/postprocessor.html#pfl.postprocessor.base.Postprocessor) that dumps the metrics to disk for offline analysis (a postprocessor have access to an individual user's metrics). The offline part to analyze and visualize the per-client metrics over time is outside the scope for this GH issue. This solution must be compatible with distributed simulations. this may require an all-gather if multi-node simulations, but being restricted to single node multi-gpu simulations for this feature is OK. The result should be (csv?) file(s) with per-client metrics.