Metrics logging improvement

Considered that:

Round clients test after round execution makes no sense after broadcast and it makes little sense after train and before broadcast
It makes no sense to test all clients if most of clients have not updated. Just eval new model and compute average of history of all clients not in the current round and the new model
It’s faster to save checkpoints compared to test the model in the simulation

Proposed changes are:

Do not perform any model evaluation in the simulation
Do not log metrics to metrics.csv, log to rounds.csv instead. Save only indexes of round clients for each round. The format of each record line could be: [round idx], [client_id 0], ..., [client_id num_clients]
Save a checkpoint of the aggregated server model at each round
Save a checkpoint for round clients models at each round
Create an analysis script that:
- assumes starting accuracy of 1/[num_classes] at round -1 for each client
- construct round-indexed metrics series for each client by reading rounds.csv. This means that each client should have a list of length num_rounds where each entry is the metric of interest. Initially all metrics are zero-initialized
- rounds.csv is sequentially read and corresponding checkpoints are loaded and evaluated (checkpoints filename must include round id and client id). Metrics are save to clients metrics series
- empty (zero) values in series are set to the first previous non-zero value, i.e. metrics only change with round selection

Rubisoft-Partnership / lightweight-neural-nets