SMILELab-FL / FedLab

A flexible Federated Learning Framework based on PyTorch, simplifying your Federated Learning research.
https://fedlab.readthedocs.io
Apache License 2.0
739 stars 127 forks source link
deep-learning federated-learning federated-learning-framework fedlab machine-learning pytorch pytorch-federated-learning

FedLab: A Flexible Federated Learning Framework

GH Actions Tests Documentation Status License codecov arXiv Pyversions

Federated learning (FL), proposed by Google at the very beginning, is recently a burgeoning research area of machine learning, which aims to protect individual data privacy in the distributed machine learning processes, especially in finance, smart healthcare, and edge computing. Different from traditional data-centered distributed machine learning, participants in the FL setting utilize localized data to train local models, then leverages specific strategies with other participants to acquire the final model collaboratively, avoiding direct data-sharing behavior.

To relieve the burden of researchers in implementing FL algorithms and emancipate FL scientists from the repetitive implementation of basic FL settings, we introduce a highly customizable framework FedLab in this work. FedLab provides the necessary modules for FL simulation, including communication, compression, model optimization, data partition and other functional modules. Users can build an FL simulation environment with custom modules like playing with LEGO bricks. For better understanding and easy usage, the FL baseline algorithms implemented via FedLab are also presented.

Quick start

Install

Learning materials

We provide tutorials in jupyter notebook format for FedLab beginners in FedLab\tutorials. These tutorials include data partition, customized algorithms, and pipeline demos. For the FedLab or FL beginners, we recommend this notebook. Furthermore, we provide reproductions of federated algorithms via FedLab, which are stored in fedlab.contirb.algorithm. We think they are good examples for users to further explore FedLab.

Website Documentations are available:

Run Examples

# example of standalone
$ cd ./examples/standalone/
$ python standalone.py --total_clients 100 --com_round 3 --sample_ratio 0.1 --batch_size 100 --epochs 5 --lr 0.02

Architecture

Files architecture of FedLab. These contents may be helpful for users to understand our repo.

├── fedlab
│   ├── contrib
│   ├── core
│   ├── models
│   └── utils
├── datasets
│   └── ...
├── examples
│   ├── asynchronous-cross-process-mnist
│   ├── cross-process-mnist
│   ├── hierarchical-hybrid-mnist
│   ├── network-connection-checker
│   ├── scale-mnist
│   └── standalone-mnist
└── tutorials
    ├── communication_tutorial.ipynb
    ├── customize_tutorial.ipynb
    ├── pipeline_tutorial.ipynb
    └── ...

Baselines

We provide the reproduction of baseline federated algorthms for users in this repo.

Method Type Paper Publication Official code
FedAvg Optim. Communication-Efficient Learning of Deep Networks from Decentralized Data AISTATS'2017
FedProx Optim. Federated Optimization in Heterogeneous Networks MLSys' 2020 Code
FedDyn Optim. Federated Learning Based on Dynamic Regularization ICLR' 2021 Code
q-FFL Optim. Fair Resource Allocation in Federated Learning ICLR' 2020 Code
FedNova Optim. Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization NeurIPS'2020 Code
IFCA Optim. An Efficient Framework for Clustered Federated Learning NeurIPS'2020 Code
Ditto Optim. [Ditto: Fair and Robust Federated Learning Through Personalization]() ICML'2021 Code
SCAFFOLD Optim. [SCAFFOLD: Stochastic Controlled Averaging for Federated Learning]() ICML'2020
Personalized-FedAvg Optim. Improving Federated Learning Personalization via Model Agnostic Meta Learning Pre-print
CFL Optim. Clustered Federated Learning: Model-Agnostic Distributed Multi-Task Optimization under Privacy Constraints IEEE'2020 Code
Power-of-choice Misc. Client Selection in Federated Learning: Convergence Analysis and Power-of-Choice Selection Strategies AISTATS'2021
QSGD Com. QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding NeurIPS'2017
NIID-Bench Data. Federated Learning on Non-IID Data Silos: An Experimental Study ICDE' 2022 Code
LEAF Data. LEAF: A Benchmark for Federated Settings Pre-print Code
...

Datasets & Data Partition

Sophisticated in the real world, FL needs to handle various kind of data distribution scenarios, including iid and non-iid scenarios. Though there already exists some datasets and partition schemes for published data benchmark, it still can be very messy and hard for researchers to partition datasets according to their specific research problems, and maintain partition results during simulation. FedLab provides fedlab.utils.dataset.partition.DataPartitioner that allows you to use pre-partitioned datasets as well as your own data. DataPartitioner stores sample indices for each client given a data partition scheme. Also, FedLab provides some extra datasets that are used in current FL researches while not provided by official PyTorch torchvision.datasets yet.

Data Partition

We provide multiple data partition schemes used in recent FL papers[1][2][3]. Here we show the data partition visualization of several common used datasets as the examples.

1. Balanced IID partition

Each client has same number of samples, and same distribution for all class samples.

Given 100 clients and CIFAR10, the data samples assigned to the first 10 clients could be:

2. Unbalanced IID partition

Assign different sample number for each client using Log-Normal distribution $\text{Log-N}(0, \sigma^2)$, while keep same distribution for different class samples.

Given $\sigma=0.3$, 100 clients and CIFAR10, the data samples assigned to the first 10 clients is showed left below. And distribution of sample number for clients is showed right below.

   

3. Hetero Dirichlet partition

Non-iid partition used in [3] and [6]. Number of data points and class proportions are unbalanced. Samples will be partitioned into $J$ clients by sampling $p_k∼\text{Dir}J(\alpha)$ and allocating a $p{k,j}$ proportion of the samples of class $k$ to local client $j$.

Given 100 clients, $\alpha=0.3$ and CIFAR10, the data samples assigned to the first 10 clients is showed left below. And distribution of sample number for clients is showed right below.

   

4. Shards partition

Non-iid partition based on shards, used in [4].

Given shard_number=200, 100 clients and CIFAR10, the data samples assigned to the first 10 clients could be:

5. Balanced Dirichlet partition

Non-iid partition used in [5]. Each client has same number of samples, while class distribution in each client follows Dirichlet distribution $\text{Dir}{(\alpha)}$.

Given $\alpha=0.3$, 100 clients and CIFAR10, the data samples assigned to the first 10 clients could be:

6. Unbalanced Dirichlet partition

Non-iid partition used in [5]. Sample numbers of clients are drawn from Log-normal distribution $\text{Log-N}(0, \sigma^2)$, while class distribution in each client follows Dirichlet distribution $\text{Dir}{(\alpha)}$.

Given $\sigma=0.3$, $\alpha=0.3$, 100 clients and CIFAR10, the data samples assigned to the first 10 clients is showed left below. And distribution of sample number for clients is showed right below.

   

7. Quantity-based Label Distribution Skew partition

Non-iid partition used in [1]. Each client has only specific number of sample class.

Given class number for each client as $3$, 10 clients and FashionMNIST, the data samples assigned to each client could be:

8. Noise-based Feature Distribution Skew partition

Non-iid partition used in [1]. Different client's sample feature has different levels of Gaussian noise. Data example for 10 clients could be:

9. FCUBE Synthetic partition

Non-iid partition used in [1]. Data example for 4 clients could be shown as:

Datasets supported

Data Type Data Name #Training Samples #Test Samples #Label Classes
Vision data CIFAR10 50K  10K 10
CIFAR100 50K 10K  100
FashionMNIST 60K 10K  10
MNIST 60K 10K  10
SVHN 73K 26K  10
CelebA 200, 288  2
FEMNIST 805, 263  62
Text data Shakespeare 4, 226, 158  -
Sent14 1, 600, 498  3
Reddit 56, 587, 343  -
Tabular data Adult 32, 561  16, 281 2
Covtype  581, 012  2
RCV1 binary 20, 242  677, 399 2
Synthetic data FCUBE  -  - 2
LEAF-Synthetic  -  -  -

Partition Visualization

For data distribution visualization in data partition, we provide fedlab.utils.dataset.functional.feddata_scatterplot() for users' convenience.

Visualization for synthetic partition code below:

import numpy as np
from matplotlib import pyplot as plt
from fedlab.utils.dataset.functional import feddata_scatterplot

sample_num = 15
class_num = 4
clients_num = 3
num_per_client = int(sample_num/clients_num)
labels = np.random.randint(class_num, size=sample_num)  # generate 15 labels, each label is 0 to 3
rand_per = np.random.permutation(sample_num)
# partition synthetic data into 3 clients
data_indices = {0: rand_per[0:num_per_client],
                1: rand_per[num_per_client:num_per_client*2],
                2: rand_per[num_per_client*2:num_per_client*3]}
title = 'Data Distribution over Clients for Each Class'
fig = feddata_scatterplot(labels.tolist(),
                          data_indices,
                          clients_num,
                          class_num,
                          figsize=(6, 4),
                          max_size=200,
                          title=title)
plt.show(fig)
fig.savefig(f'imgs/feddata-scatterplot-vis.png') 

Visualization result for CIFAR-10 Dirichlet Non-IID with $\alpha=0.6$ on 5 clients:

Performance & Insights

We provide the performance report of several reproduced federated learning algorithms to illustrate the correctness of FedLab in simulation. Furthermore, we describe several insights FedLab could provide for federated learning research. Without loss of generality, this section's experiments are conducted on partitioned MNIST datasets. The conclusions and observations in this section should still be valid in other data sets and scenarios.

Federated Optimization on Non-IID Data

We choose $\alpha = [0.1, 0.3, 0.5, 0.7]$ in label Dirichlet partitioned MNIST with 100 clients. We run 200 rounds of FedAvg with 5 local batches with full batch, learning rate 0.1, and sample ratio 0.1 (10 clients for each FL round). The test accuracy over the communication round is shown below. The results reveal the most vital challenge in federated learning.

We use the same partitioned MNIST dataset in FedAvg[4] to evaluate the corectness of FedLab. The rounds for FedAvg to achieve 97% test accuracy on MNIST using 2NN with E=5 reported in [4] / FedLab:

Sample ratio IID Non-IID
B=FULL B=10 B=FULL B=10
0.0 1455 / 1293 316 / 77 4278 / 1815 3275 / 1056
0.1 1474 / 1230 87 / 43 1796 / 2778 664 / 439
0.2 1658 / 1234 77 / 37 1528 / 2805 619 / 427
0.5 -- / 1229 75 / 36 -- / 3034 443 / 474
1.0 -- / 1284 70 / 35 -- / 3154 380 / 507

The results are obtained by running the tutorial with random seed 0.

Simulation Efficiency

Time cost in 100 rounds (50 clients are sampled per round) under different acceleration settings. 1M-10P stands for the simulation runs on 1 machine with 4 GPUs and 10 processes. 2M-10P stands for the simulation runs on 2 machines with 4 GPUs and 10 processes (5 processes on each machine).

Hardware platform: Intel(R) Xeon(R) Gold 6240L CPU @ 2.60GHz + Tesla V100 * 4.

Standalone Cross-process 1M-10P Cross-process 2M-10P
45.6 Min 2.9 Min 4.23 Min

The results are obtained by running the tutorial and an example of cross-process scenario. Besides, the results reveal the simulation efficiency of FedLab under different simulation modes. Cross-process with 2 machines could be slower in this setting due to communication bottleneck.

Communication Efficiency

We provide a few performance baselines in communication-efficient federated learning including QSGD and top-k. In the experiment setting, we choose $\alpha = 0.5$ in the label Dirichlet partitioned MNIST with 100 clients. We run 200 rounds with a sample ratio of 0.1 (10 clients for each FL round) of FedAvg, where each client performs 5 local epochs of SGD with a full batch and learning rate of 0.1. We report the top-1 test accuracy and its communication volume during the training.

Setting Baseline QSGD-4bit QSGD-8bit QSGD-16bit Top-5% Top-10% Top-20%
Test Accuracy (%) 93.14 93.03 93.27 93.11 11.35 61.25 89.96
Communication (MB) 302.45 45.59 85.06 160.67 0.94 1.89 3.79

The above results are obtained by running the tutorial.

Citation

Please cite FedLab in your publications if it helps your research:

@article{JMLR:v24:22-0440,
  author  = {Dun Zeng and Siqi Liang and Xiangjing Hu and Hui Wang and Zenglin Xu},
  title   = {FedLab: A Flexible Federated Learning Framework},
  journal = {Journal of Machine Learning Research},
  year    = {2023},
  volume  = {24},
  number  = {100},
  pages   = {1--7},
  url     = {http://jmlr.org/papers/v24/22-0440.html}
}

or

@article{zeng2021fedlab,
  title={Fedlab: A flexible federated learning framework},
  author={Zeng, Dun and Liang, Siqi and Hu, Xiangjing and Wang, Hui and Xu, Zenglin},
  journal={arXiv preprint arXiv:2107.11621},
  year={2021}
}

Contact

Project Investigator: Prof. Zenglin Xu (xuzenglin@hit.edu.cn).

For technical issues related to FedLab development, please contact our development team through Github issues or email:

References

[1] Li, Q., Diao, Y., Chen, Q., & He, B. (2022, May). Federated learning on non-iid data silos: An experimental study. In 2022 IEEE 38th International Conference on Data Engineering (ICDE) (pp. 965-978). IEEE.

[2] Caldas, S., Duddu, S. M. K., Wu, P., Li, T., Konečný, J., McMahan, H. B., ... & Talwalkar, A. (2018). Leaf: A benchmark for federated settings. arXiv preprint arXiv:1812.01097.

[3] Yurochkin, M., Agarwal, M., Ghosh, S., Greenewald, K., Hoang, N., & Khazaeni, Y. (2019, May). Bayesian nonparametric federated learning of neural networks. In International Conference on Machine Learning (pp. 7252-7261). PMLR.

[4] McMahan, B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017, April). Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics (pp. 1273-1282). PMLR.

[5] Acar, D. A. E., Zhao, Y., Navarro, R. M., Mattina, M., Whatmough, P. N., & Saligrama, V. (2021). Federated learning based on dynamic regularization. arXiv preprint arXiv:2111.04263.

[6] Wang, H., Yurochkin, M., Sun, Y., Papailiopoulos, D., & Khazaeni, Y. (2020). Federated learning with matched averaging. arXiv preprint arXiv:2002.06440.