LabeliaLabs / distributed-learning-contributivity

Simulate collaborative ML scenarios, experiment multi-partner learning approaches and measure respective contributions of different datasets to model performance.
https://www.labelia.org
Apache License 2.0
56 stars 12 forks source link

Need a parameter to choose if the val/test sets would be local or global #287

Closed arthurPignet closed 3 years ago

arthurPignet commented 3 years ago

For CIFAR10 and MNIST, we define local and global val/test datasets. Thus we only train on 60% of the dataset, and the local val/test set are never used.

We need a parameter to choose if these test/val sets would be local or global.

This parameter should also be used in the log_perf functions, to select the type of dataset on which the scores will be computed.

For instance a dataset would provide train, test, and val sets. Then when splitting the train dataset between partners, the val and test sets would also be split between partners if the test_set/val_set is 'local'.

Maybe with two parameters ? like: validation_set: str, 'local', 'global', 'both' (not a fan of 'both, any suggestion ?) test_set: str, 'local', 'global', 'both' (not a fan of 'both, any suggestion ?)

If validation_set is 'local', the val_acc, val_loss per partner will be computed locally. And the global_model val perf would be the average of the local perfs. For the early-stopping, we could use this averaged score. However it might raises safety issues.

if test_set is 'local' the final test score is computed as the average of partner's local test score.

With global, we should keep the current implementation in my opinion.

I suggest that the local val and test sets are corrupted, if corruption is specified of course.

What do you think @bowni , @RomainGoussault ?