Why the performance on CIFAR-10 is very low

Bayesanctury commented 2 years ago

Hello Xinyi, Many thanks for your great work. I have a question regarding the model performance on the CIFAR-10 dataset. Why the performance is so low for federated learning even the data samples are uniformly distributed? In centralized training, the model can easily achieve over 90% accuracy. In the reported table 1, the Standalone and FedAvg are only 46% and 48%.

Looking forward to your reply. Thanks~

XinyiYS commented 2 years ago

Hi @Bayesanctury, thanks for your interest in our work. The performance discrepancy is due to a reduced data setting. In particular, in this repository if you look at main.py lines 89-92, you can see that each agent on average uses 2000 data points from the CIFAR-10 dataset which contains 50000 (training) + 10000 (test). So for 10 agents, the training only uses 20000 (40%) of the entire training dataset in CIFAR-10 and hence the lower accuracy. With this, I would like to point out that the purpose of this framework is not to achieve the absolute highest predictive performance possible but to provide a fair collaborative mechanism, which is why a reduced data setting is acceptable and also since it is less computationally costly.

Bayesanctury commented 2 years ago

Many thanks for your kind reply.

I followed your readme guideline and ran the experiments on cifar-10. I found the results are mismatched with what you report in Table.1. Could you please explain this? Here is the last line result from the valid.csv 0.016409667 0.4111 0.016408276 0.41139999 0.01640928 0.411199987 0.016411796 0.411300004 0.01641012 0.411199987 0.016408097 0.4111 0.016409835 0.411000013 0.016422883 0.410899997 0.016409773 0.411300004 0.016408417 0.411300004 0.01640621 0.411000013 The average accuracy is 41.16, which is quite lower than 61 reported in the table.

Below are the details: Settings: dataset : cifar10 batch_size : 128 train_val_split_ratio : 0.8 alpha : 0.95 Gamma : 0.15 lambda : 0.5 model_fn : <class 'utils.defined_models.CNN_Cifar10'> optimizer_fn : <class 'torch.optim.sgd.SGD'> loss_fn : NLLLoss() lr : 0.015 gamma : 0.977 lr_decay : 0.977 iterations : 200 E : 3 num_classes : 10 n_agents : 10 cuda : True split : uniform sample_size_cap : 20000 beta : 0.5dataset : cifar10 batch_size : 128 train_val_split_ratio : 0.8 alpha : 0.95 Gamma : 0.15 lambda : 0.5 model_fn : <class 'utils.defined_models.CNN_Cifar10'> optimizer_fn : <class 'torch.optim.sgd.SGD'> loss_fn : NLLLoss() lr : 0.015 gamma : 0.977 lr_decay : 0.977 iterations : 200 E : 3 num_classes : 10 n_agents : 10 cuda : True split : uniform sample_size_cap : 20000 beta : 0.5

Running: python main.py -D cifar10 -N 10 -split uni OS: Ubuntu 18.04.6 LTS, Nvidia driver Version: 510.60 CUDA Version: 11.6

Bayesanctury commented 2 years ago

Greetings! Thanks for your great work again. Could you please help identify the reasons for the low performance? I use your code without any revisions. Besides, I tried to increase the samples to 50000, but the performance is also lower (got 55) than your reported 61.

XinyiYS commented 2 years ago

Hi there,

[To increase the average accuracy]: you can try increase the value of the degree of altruism, namely $\beta$. In the paper, the highest average accuracy is achieved with generally higher values of $\beta$. For instance, if you take a look at Table 1 in our paper, you will see that the accuracies (around 60) achieved for CIFAR-10 is when $\beta \in[1,2]$ where in your experiments $\beta=0.5$.

[As an explanation]: intuitively, as the name degree of altruism suggests, $\beta$ controls how altruistic the agents are. Specifically, the more altruistic they are, the more they are willing to allow the agents with lower contributions to receive better gradient rewards (thereby arriving at higher final accuracies, which in turn leads to higher average accuracies). For technical details, you can refer to our paper the discussion on Gradient Download Step on page 5 and for empirical results, you can refer to Table 1 and Section 4.2.

Bayesanctury commented 2 years ago

Many thanks for your kind explanations!

Sorry for not reporting all the results, actually, there is a $\beta$ loop ([0.5, 1, 1.2, 1.5, 2, 1e7]) in your code logic. I checked all the results with different $\beta$ values. Since the data distribution is UNI, each client performs similarly, so I listed the average results.

$\beta=0.5$: 41.16
$\beta=1.0$: 40.56
$\beta=1.5$: 41.38
$\beta=2.0$: 40.37

The experiment settings and environments are exactly the same as the one in my last reply. Only the $\beta$ value changed. It seems $\beta$ does not change the final accuracy a lot. Have you validated that the results are reproducible with the current code version?

JLA0809 commented 1 year ago

First of all, many thanks for the very thorough implementation and great research contribution!

I want to use your idea in my research project and could also not reproduce your results with the current settings. I get huge differences, especially in the fairness of e.g. the power law configuration. Are there any updates regarding this issue?

XinyiYS / Gradient-Driven-Rewards-to-Guarantee-Fairness-in-Collaborative-Machine-Learning

Why the performance on CIFAR-10 is very low #1