TsingZ0 / PFLlib

37 traditional FL (tFL) or personalized FL (pFL) algorithms, 3 scenarios, and 20 datasets.
GNU General Public License v2.0
1.51k stars 310 forks source link
attack differential-privacy distributed-computing dlg domainnet federated-learning heterogeneity imagenet iot non-iid personalization privacy privacy-protection python pytorch

PFLlib: Personalized Federated Learning Library

πŸ‘ We will change the license to Apache-2.0 in the next release.

License: GPL v2 arXiv

Figure 1: An Example for FedAvg. You can create a scenario using generate_DATA.py and run an algorithm using main.py, clientNAME.py, and serverNAME.py. For a new algorithm, you only need to add new features in clientNAME.py and serverNAME.py.

🎯We create a user-friendly algorithm library and evaluation platform for those new to federated learning. Join us in expanding the FL community by contributing your algorithms, datasets, and metrics to this project.

🎯If you find our repository useful, please cite the following paper:

@article{zhang2023pfllib,
  title={PFLlib: Personalized Federated Learning Algorithm Library},
  author={Zhang, Jianqing and Liu, Yang and Hua, Yang and Wang, Hao and Song, Tao and Xue, Zhengui and Ma, Ruhui and Cao, Jian},
  journal={arXiv preprint arXiv:2312.04992},
  year={2023}
}

The origin of the data heterogeneity phenomenon is the characteristics of users, who generate non-IID (not Independent and Identically Distributed) and unbalanced data. With data heterogeneity existing in the FL scenario, a myriad of approaches have been proposed to crack this hard nut. In contrast, the personalized FL (pFL) may take advantage of the statistically heterogeneous data to learn the personalized model for each user.

Thanks to @Stonesjtu, this library can also record the GPU memory usage for the model. Following FedCG, we also introduce the DLG (Deep Leakage from Gradients) attack and PSNR (Peak Signal-to-Noise Ratio) metric to evaluate the privacy-preserving ability of tFL/pFL algorithms (please refer to ./system/flcore/servers/serveravg.py for example). Now we can train on some clients and evaluate performance on other new clients by setting args.num_new_clients in ./system/main.py. Note that not all the tFL/pFL algorithms support this feature.

Algorithms with code (updating)

Traditional FL (tFL)

Personalized FL (pFL)

Datasets and scenarios (updating)

For the label skew scenario, we introduce 14 famous datasets: MNIST, EMNIST, Fashion-MNIST, Cifar10, Cifar100, AG News, Sogou News, Tiny-ImageNet, Country211, Flowers102, GTSRB, Shakespeare, and Stanford Cars, they can be easy split into IID and non-IID version. Since some codes for generating datasets such as splitting are the same for all datasets, we move these codes into ./dataset/utils/dataset_utils.py. In the non-IID scenario, 2 situations exist. The first one is the pathological non-IID scenario, the second one is the practical non-IID scenario. In the pathological non-IID scenario, for example, the data on each client only contains the specific number of labels (maybe only 2 labels), though the data on all clients contains 10 labels such as the MNIST dataset. In the practical non-IID scenario, Dirichlet distribution is utilized (please refer to this paper for details). We can input balance for the iid scenario, where the data are uniformly distributed.

For the feature shift scenario, we use 3 datasets that are widely used in Domain Adaptation: Amazon Review (fetch raw data from this site), Digit5 (fetch raw data from this site), and DomainNet.

For the real-world scenario, we also introduce 3 naturally separated datasets: Omniglot (20 clients, 50 labels), HAR (Human Activity Recognition) (30 clients, 6 labels), PAMAP2 (9 clients, 12 labels). For the details of datasets and FL algorithms in IoT, please refer to FL-IoT.

If you need another data set, just write another code to download it and then use the utils.

Examples for MNIST

The output of python generate_MNIST.py noniid - dir

Number of classes: 10
Client 0         Size of data: 2630      Labels:  [0 1 4 5 7 8 9]
                 Samples of labels:  [(0, 140), (1, 890), (4, 1), (5, 319), (7, 29), (8, 1067), (9, 184)]
--------------------------------------------------
Client 1         Size of data: 499       Labels:  [0 2 5 6 8 9]
                 Samples of labels:  [(0, 5), (2, 27), (5, 19), (6, 335), (8, 6), (9, 107)]
--------------------------------------------------
Client 2         Size of data: 1630      Labels:  [0 3 6 9]
                 Samples of labels:  [(0, 3), (3, 143), (6, 1461), (9, 23)]
--------------------------------------------------
Show more Client 3 Size of data: 2541 Labels: [0 4 7 8] Samples of labels: [(0, 155), (4, 1), (7, 2381), (8, 4)] -------------------------------------------------- Client 4 Size of data: 1917 Labels: [0 1 3 5 6 8 9] Samples of labels: [(0, 71), (1, 13), (3, 207), (5, 1129), (6, 6), (8, 40), (9, 451)] -------------------------------------------------- Client 5 Size of data: 6189 Labels: [1 3 4 8 9] Samples of labels: [(1, 38), (3, 1), (4, 39), (8, 25), (9, 6086)] -------------------------------------------------- Client 6 Size of data: 1256 Labels: [1 2 3 6 8 9] Samples of labels: [(1, 873), (2, 176), (3, 46), (6, 42), (8, 13), (9, 106)] -------------------------------------------------- Client 7 Size of data: 1269 Labels: [1 2 3 5 7 8] Samples of labels: [(1, 21), (2, 5), (3, 11), (5, 787), (7, 4), (8, 441)] -------------------------------------------------- Client 8 Size of data: 3600 Labels: [0 1] Samples of labels: [(0, 1), (1, 3599)] -------------------------------------------------- Client 9 Size of data: 4006 Labels: [0 1 2 4 6] Samples of labels: [(0, 633), (1, 1997), (2, 89), (4, 519), (6, 768)] -------------------------------------------------- Client 10 Size of data: 3116 Labels: [0 1 2 3 4 5] Samples of labels: [(0, 920), (1, 2), (2, 1450), (3, 513), (4, 134), (5, 97)] -------------------------------------------------- Client 11 Size of data: 3772 Labels: [2 3 5] Samples of labels: [(2, 159), (3, 3055), (5, 558)] -------------------------------------------------- Client 12 Size of data: 3613 Labels: [0 1 2 5] Samples of labels: [(0, 8), (1, 180), (2, 3277), (5, 148)] -------------------------------------------------- Client 13 Size of data: 2134 Labels: [1 2 4 5 7] Samples of labels: [(1, 237), (2, 343), (4, 6), (5, 453), (7, 1095)] -------------------------------------------------- Client 14 Size of data: 5730 Labels: [5 7] Samples of labels: [(5, 2719), (7, 3011)] -------------------------------------------------- Client 15 Size of data: 5448 Labels: [0 3 5 6 7 8] Samples of labels: [(0, 31), (3, 1785), (5, 16), (6, 4), (7, 756), (8, 2856)] -------------------------------------------------- Client 16 Size of data: 3628 Labels: [0] Samples of labels: [(0, 3628)] -------------------------------------------------- Client 17 Size of data: 5653 Labels: [1 2 3 4 5 7 8] Samples of labels: [(1, 26), (2, 1463), (3, 1379), (4, 335), (5, 60), (7, 17), (8, 2373)] -------------------------------------------------- Client 18 Size of data: 5266 Labels: [0 5 6] Samples of labels: [(0, 998), (5, 8), (6, 4260)] -------------------------------------------------- Client 19 Size of data: 6103 Labels: [0 1 2 3 4 9] Samples of labels: [(0, 310), (1, 1), (2, 1), (3, 1), (4, 5789), (9, 1)] -------------------------------------------------- Total number of samples: 70000 The number of train samples: [1972, 374, 1222, 1905, 1437, 4641, 942, 951, 2700, 3004, 2337, 2829, 2709, 1600, 4297, 4086, 2721, 4239, 3949, 4577] The number of test samples: [658, 125, 408, 636, 480, 1548, 314, 318, 900, 1002, 779, 943, 904, 534, 1433, 1362, 907, 1414, 1317, 1526] Saving to disk. Finish generating dataset.

Models

Environments

Install CUDA v11.6.

Install conda latest and activate conda.

conda env create -f env_cuda_latest.yaml # You may need to downgrade the torch using pip to match the CUDA version

How to start simulating (examples for FedAvg)

Note: It is preferable to tune algorithm-specific hyper-parameters before using any algorithm on a new machine.

Practical situations

If you need to simulate FL under practical situations, which includes client dropout, slow trainers, slow senders, and network TTL, you can set the following parameters to realize it.

Easy to extend

It is easy to add new algorithms and datasets to this library.

Experimental results

If you are interested in the experimental results (e.g., the accuracy) of the above algorithms, you can find some results in our accepted FL papers (i.e., FedALA, FedCP, GPFL, and DBE) listed as follows that also use this library. Please note that this developing project may not be able to reproduce the results on these papers, since some basic settings may change due to the requests of the community. For example, we previously set shuffle=False in clientbase.py

@inproceedings{zhang2023fedala,
  title={Fedala: Adaptive local aggregation for personalized federated learning},
  author={Zhang, Jianqing and Hua, Yang and Wang, Hao and Song, Tao and Xue, Zhengui and Ma, Ruhui and Guan, Haibing},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={37},
  number={9},
  pages={11237--11244},
  year={2023}
}

@inproceedings{Zhang2023fedcp,
  author = {Zhang, Jianqing and Hua, Yang and Wang, Hao and Song, Tao and Xue, Zhengui and Ma, Ruhui and Guan, Haibing},
  title = {FedCP: Separating Feature Information for Personalized Federated Learning via Conditional Policy},
  year = {2023},
  booktitle = {Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining}
}

@inproceedings{zhang2023gpfl,
  title={GPFL: Simultaneously Learning Global and Personalized Feature Information for Personalized Federated Learning},
  author={Zhang, Jianqing and Hua, Yang and Wang, Hao and Song, Tao and Xue, Zhengui and Ma, Ruhui and Cao, Jian and Guan, Haibing},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={5041--5051},
  year={2023}
}

@inproceedings{zhang2023eliminating,
  title={Eliminating Domain Bias for Federated Learning in Representation Space},
  author={Jianqing Zhang and Yang Hua and Jian Cao and Hao Wang and Tao Song and Zhengui XUE and Ruhui Ma and Haibing Guan},
  booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
  year={2023},
  url={https://openreview.net/forum?id=nO5i1XdUS0}
}