Rubisoft-Partnership / lightweight-neural-nets

Neural network models for embedded devices
0 stars 0 forks source link

Metrics logging improvement #49

Closed FedericoRubbi closed 3 months ago

FedericoRubbi commented 3 months ago

Considered that:

  1. Round clients test after round execution makes no sense after broadcast and it makes little sense after train and before broadcast
  2. It makes no sense to test all clients if most of clients have not updated. Just eval new model and compute average of history of all clients not in the current round and the new model
  3. It’s faster to save checkpoints compared to test the model in the simulation

Proposed changes are:

  1. Do not perform any model evaluation in the simulation
  2. Do not log metrics to metrics.csv, log to rounds.csv instead. Save only indexes of round clients for each round. The format of each record line could be: [round idx], [client_id 0], ..., [client_id num_clients]
  3. Save a checkpoint of the aggregated server model at each round
  4. Save a checkpoint for round clients models at each round
  5. Create an analysis script that:
    • assumes starting accuracy of 1/[num_classes] at round -1 for each client
    • construct round-indexed metrics series for each client by reading rounds.csv. This means that each client should have a list of length num_rounds where each entry is the metric of interest. Initially all metrics are zero-initialized
    • rounds.csv is sequentially read and corresponding checkpoints are loaded and evaluated (checkpoints filename must include round id and client id). Metrics are save to clients metrics series
    • empty (zero) values in series are set to the first previous non-zero value, i.e. metrics only change with round selection