TsingZ0 / FedALA

AAAI 2023 accepted paper, FedALA: Adaptive Local Aggregation for Personalized Federated Learning
Apache License 2.0
113 stars 18 forks source link

There is a question about reproduction #18

Closed ck-available closed 6 months ago

neko941 commented 7 months ago

Hi, I also have a question about the reproduction process. I followed your instructions in "Hyperparameter Settings," but the result is much lower than expected.

With this repo (https://github.com/TsingZ0/FedALA), I used the below command and yielded 0.7332886805 on best accuracy. python -u main.py --algorithm=FedALA --num_classes=10 --global_rounds=2000 --dataset=Cifar10-noniid_unbalanced_dir --eta=1 --rand_percent=80 --layer_idx=1

With the PFLlib repo (https://github.com/TsingZ0/PFLlib), I used the below command and yielded 0.797934022 on best accuracy. python -u main.py --algorithm=FedALA --num_classes=10 --global_rounds=2000 --dataset=Cifar10-noniid_unbalanced_dir --eta=1 --rand_percent=80 --layer_idx=1

The dataset "Cifar10-noniid_unbalanced_dir" is generated with the command below from the PFLlib repo. I plotted the distribution, which seems to match your paper's plot. python generate_Cifar10.py noniid - dir { "num_clients": 20, "num_classes": 10, "non_iid": True, "balance": False, "partition": "dir", "Size of samples for labels in clients": [ [[1,9],[3,668],[4,476],[5,287],[6,5],[7,167],[9,9]], [[0,7],[1,1],[2,62],[4,43],[7,177],[8,1524]], [[2,16],[3,158],[4,1034],[5,3]], [[1,402],[2,12],[3,7],[6,1],[8,1],[9,1657]], [[0,24],[2,5104]], [[0,2832],[3,1432]], [[0,36],[2,578],[3,708],[4,114],[8,4]], [[1,210],[2,2],[3,30],[4,962],[5,1],[7,2],[9,1415]], [[1,18],[3,137],[4,11],[7,2568],[8,8],[9,232]], [[0,2],[3,2511],[4,723]], [[0,454],[2,2],[7,53],[8,655]], [[0,9],[1,744],[3,7],[4,187],[5,5],[7,93],[8,226],[9,693]], [[0,248],[2,2],[4,360],[5,1802],[6,1],[8,3581]], [[1,21],[2,3],[3,237],[4,96],[5,1161],[7,9]], [[1,16],[4,153],[5,1],[7,2930]], [[1,1394],[2,157],[3,28],[4,18],[6,735],[9,1993]], [[0,2358],[1,9],[2,61],[3,2],[4,1822]], [[1,3175]], [[0,29],[3,74],[5,89],[6,5257]], [[0,1],[1,1],[2,1],[3,1],[4,1],[5,2651],[6,1],[7,1],[8,1],[9,1]] ], "alpha": 0.1, "batch_size": 10 }

I used the conda environments that you provided for each repo. Is there something I am doing wrong when reproducing your results in Table 2? Thank you in advance!

TsingZ0 commented 7 months ago

You may need to set the --layer_idx=1 to --layer_idx=2. Since the layer index is more fine-grained when creating models using PyTorch than in theory. For example, the last layer contains both weights and biases, setting --layer_idx=1 only covers the biases of the last layer.

Please ensure that you have checked the correct layer index before adjusting --layer_idx, as indicated on line #100 of main.py.

Moreover, please check common hyperparameter settings in run_me.sh and main.py (default settings).

neko941 commented 7 months ago

Thank you for your fast response! Are the results in Table 2 being trained with p=2 because you mentioned in the "Hyperparameter Settings" that p is set to 1? "For FedALA, we set the weights learning rate η = 1.0 (selecting from [0.1, 1.0, 10.0]), random sample percent s = 80 (selecting from [5, 10, 20, 40, 60, 80, 100]), ALA range p = 1 (selecting from [1, 2, ...]1)"

TsingZ0 commented 7 months ago

The p in the paper is not exactly the layer_idx in the code, as we have already highlighted in the code. p=1 is equal to layer_idx=2, etc. This arises from the disparity between the theoretical notations and the practical implementations (e.g., PyTorch) of all DNNs.

Again,

Please ensure that you have checked the correct layer index before adjusting --layer_idx, as indicated on line #100 of main.py.