pip install -r .env/requirements.txt
For those China mainland users
poetry install --no-root -C .env
For others
cd .env && sed -i "10,14d" pyproject.toml && poetry lock --no-update && poetry install --no-root
For those China mainland users
docker pull registry.cn-hangzhou.aliyuncs.com/karhoutam/fl-bench:master
For others
docker pull ghcr.io/karhoutam/fl-bench:master
An example of building container
docker run -it --name fl-bench -v path/to/FL-bench:/root/FL-bench --privileged --gpus all ghcr.io/karhoutam/fl-bench:master
ALL classes of methods are inherited from FedAvgServer
and FedAvgClient
. If you wanna figure out the entire workflow and detail of variable settings, go check src/server/fedavg.py
and src/client/fedavg.py
.
Partition the MNIST according to Dir(0.1) for 100 clients
python generate_data.py -d mnist -a 0.1 -cn 100
About methods of generating federated dastaset, go check data/README.md
for full details.
python main.py [--config-path, --config-name] [method=<METHOD_NAME> args...]
method
: The algorithm's name, e.g., method=fedavg
. ❗ Method name should be identical to the .py
file name in src/server
.--config-path
: Relative path to the directory of the config file. Defaults to config
.--config-name
: Name of .yaml
config file (w/o the .yaml
extension). Defaults to defaults
, which points to config/defaults.yaml
.Such as running FedAvg with all defaults.
python main.py method=fedavg
Defaults are set in both config/defaults.yaml
and src/utils/constants.py
.
python main.py --config-name my_cfg.yaml method=fedprox fedprox.mu=0.01
.src/utils/constants.py/DEFAULT_COMMON_ARGS
or get_hyperparams()
of the method⚠ For the same FL method argument, the priority of argument setting is CLI > Config file > Default value.
For example, the default value of fedprox.mu
is 1
,
# src/server/fedprox.py
class FedProxServer(FedAvgServer):
@staticmethod
def get_hyperparams(args_list=None) -> Namespace:
parser = ArgumentParser()
parser.add_argument("--mu", type=float, default=1.0)
return parser.parse_args(args_list)
and your .yaml
config file has
# config/your_config.yaml
...
fedprox:
mu: 0.01
python main.py method=fedprox # fedprox.mu = 1
python main.py --config-name your_config method=fedprox # fedprox.mu = 0.01
FL-bench supports visdom
and tensorboard
.
👀 NOTE: You needs to launch visdom
/ tensorboard
server by yourself.
# your_config.yaml
common:
...
visible: tensorboard # options: [null, visdom, tensorboard]
visdom
/ tensorboard
Servervisdom
python -m visdom.server
on terminal.localhost:8097
on your browser.tensorboard
tensorboard --logdir=<your_log_dir>
on terminal.localhost:6006
on your browser.Ray
🚀This feature can vastly improve your training efficiency. At the same time, this feature is user-friendly and easy to use!!!
# your_config.yaml
mode: parallel
parallel:
num_workers: 2 # any positive integer that larger than 1
...
...
Ray
Cluster (Optional)A Ray
cluster would be created implicitly everytime you run experiment in parallel mode.
Or you can create it manually by the command shown below to avoid creating and destroying cluster every time you run experiment.
ray start --head [OPTIONS]
👀 NOTE: You need to keep num_cpus: null
and num_gpus: null
in your config file for connecting to a existing Ray
cluster.
# your_config_file.yaml
# Connect to an existing Ray cluster in localhost.
mode: parallel
parallel:
...
num_gpus: null
num_cpus: null
...
All common arguments have their default value. Go check DEFAULT_COMMON_ARGS
in src/utils/constants.py
for full details of common arguments.
⚠ Common arguments cannot be set via CLI.
You can also write your own .yaml
config file. I offer you a template in config
and recommend you to save your config files there also.
One example: python main.py fedavg config/template.yaml [cli_method_args...]
About the default values of specific FL method arguments, go check corresponding FL-bench/src/server/<method>.py for the full details. |
Arguments | Type | Description |
---|---|---|---|
--config-path |
str |
The directory of config files. Defaults to config , means ./config . |
|
--config-name |
str |
The name of config file (w/o the .yaml extension). Defaults to defaults , which points to config/defaults.yaml . |
|
dataset |
str |
The name of dataset that experiment run on. | |
model |
str |
The model backbone experiment used. | |
seed |
int |
Random seed for running experiment. | |
join_ratio |
float |
Ratio for (client each round) / (client num in total). | |
global_epoch |
int |
Global epoch, also called communication round. | |
local_epoch |
int |
Local epoch for client local training. | |
finetune_epoch |
int |
Epoch for clients fine-tunning their models before test. | |
buffers |
str |
How to deal with parameter buffers (in model.buffers() ) of each client model. Options: [local , global , drop ]. local (default): clients' buffers are isolated; global : buffers will be aggregated like other model parameters; drop : clients will drop their buffers after training done. |
|
test_interval |
int |
Interval round of performing test on clients. | |
eval_test |
bool |
true for performing evaluation on joined clients' testset before and after local training. |
|
eval_val |
bool |
true for performing evaluation on joined clients' valset before and after local training. |
|
eval_train |
bool |
true for performing evaluation on joined clients' trainset before and after local training. |
|
optimizer |
dict |
Client-side optimizer. Argument request is the same as Optimizers in torch.optim . |
|
lr_scheduler |
dict |
Client-side learning rate scheduler. Argument request is the same as schedulers in torch.optim.lr_scheduler . |
|
verbose_gap |
int |
Interval round of displaying clients training performance on terminal. | |
batch_size |
int |
Data batch size for client local training. | |
use_cuda |
bool |
true indicates that tensors are in gpu. |
|
visible |
bool |
Options: [null , visdom , tensorboard ] |
|
straggler_ratio |
float |
The ratio of stragglers (set in [0, 1] ). Stragglers would not perform full-epoch local training as normal clients. Their local epoch would be randomly selected from range [straggler_min_local_epoch, local_epoch) . |
|
straggler_min_local_epoch |
int |
The minimum value of local epoch for stragglers. | |
external_model_params_file |
str |
The model parameters .pt file relative path to the root of FL-bench. ⚠ This feature is enabled only when unique_model=False , which is pre-defined by each FL method. |
|
save_log |
bool |
true for saving algorithm running log in out/<method>/<start_time> . |
|
save_model |
bool |
true for saving output model(s) parameters in out/<method>/<start_time> .pt`. |
|
save_fig |
bool |
true for saving the accuracy curves showed on Visdom into a .pdf file at out/<method>/<start_time> . |
|
save_metrics |
bool |
true for saving metrics stats into a .csv file at out/<method>/<start_time> . |
|
delete_useless_run |
bool |
true for deleting output files after user press Ctrl + C , which indicates that the run is removable. |
Arguments | Type | Description |
---|---|---|
num_workers |
int |
The number of parallel workers. Need to be set as an integer that larger than 1 . |
ray_cluster_addr |
str |
The IP address of the selected ray cluster. Default as null , which means if there is no existing ray cluster, ray will build a new cluster everytime you run the experiment and destroy it at the end. More details can be found in the official docs. |
num_cpus and num_gpus |
int |
The amount of computational resources you allocate for your Ray cluster. Default as null for all. |
This benchmark supports bunch of models that common and integrated in Torchvision (check here for all):
🤗 You can define your own custom model by filling the CustomModel
class in src/utils/models.py
and use it by defining model: custom
in your .yaml
config file.
Regular Image Datasets
MNIST (1 x 28 x 28, 10 classes)
CIFAR-10/100 (3 x 32 x 32, 10/100 classes)
EMNIST (1 x 28 x 28, 62 classes)
FashionMNIST (1 x 28 x 28, 10 classes)
FEMNIST (1 x 28 x 28, 62 classes)
CelebA (3 x 218 x 178, 2 classes)
SVHN (3 x 32 x 32, 10 classes)
USPS (1 x 16 x 16, 10 classes)
Tiny-ImageNet-200 (3 x 64 x 64, 200 classes)
CINIC-10 (3 x 32 x 32, 10 classes)
Domain Generalization Image Datasets
data/README.md
for the full process guideline 🧾.Medical Image Datasets
COVID-19 (3 x 244 x 224, 4 classes)
Organ-S/A/CMNIST (1 x 28 x 28, 11 classes)
The package()
at server-side class is used for assembling all parameters server need to send to clients. Similarly, package()
at client-side class is for parameters clients need to send back to server. You should always has super().package()
in your override implementation.
Consider to inherit your method classes from FedAvgServer
and FedAvgClient
for maximum utilizing FL-bench's workflow.
You can also inherit your method classes from advanced methods, e.g., FedBN, FedProx, etc. Which will inherit all functions, variables and hyperparamter settings. If you do that, you need to careful design your method in order to avoid potential hyperparamters and workflow conflicts.
class YourServer(FedBNServer):
...
class YourClient(FedBNClient): ...
- For customizing your server-side process, consider to override the `package()` and `aggregate()`.
- For customizing your client-side training, consider to override the `fit()`, `set_parameters()` and `package()`.
You can find all details in [`FedAvgClient`](src/client/fedavg.py) and [`FedAvgServer`](src/server/fedavg.py), which are the bases of all implementations in FL-bench.
### Integrating Dataset
- Inherit your own dataset class from `BaseDataset` in [`data/utils/datasets.py`](data/utils/datasets.py) and add your class in dict `DATASETS`.
### Customizing Model
- I offer the `CustomModel` class in [`src/utils/models.py`](src/utils/models.py) and you just need to define your model arch.
- If you want to use your customized model within FL-bench's workflow, the `base` and `classifier` must be defined. (Tips: You can define one of them as `torch.nn.Identity()` for bypassing it.)
## Citation 🧐
```bibtex
@software{Tan_FL-bench,
author = {Tan, Jiahao and Wang, Xinpeng},
license = {MIT},
title = {{FL-bench: A federated learning benchmark for solving image classification tasks}},
url = {https://github.com/KarhouTam/FL-bench}
}
@misc{tan2023pfedsim,
title={pFedSim: Similarity-Aware Model Aggregation Towards Personalized Federated Learning},
author={Jiahao Tan and Yipeng Zhou and Gang Liu and Jessie Hui Wang and Shui Yu},
year={2023},
eprint={2305.15706},
archivePrefix={arXiv},
primaryClass={cs.LG}
}