(New data partition strategy) Extended Dirichlet strategy by combining Pathological heterogeneous setting and Practical heterogeneous setting in pFL.

liyipeng00 commented 8 months ago

Recently, I find one new data partition strategy called Extended Dirichlet strategy ~~~ ours :), which could be added in this repo.

It combines the two common partition strategies (i.e., Quantity-based class imbalance and Diribution-based class imbalance in Li et al. (2022) or Pathological heterogeneous setting and Practical heterogeneous setting in zhang et al. (2023)) to generate arbitrarily heterogeneous data. The difference is to add a step of allocating classes (labels) to determine the number of classes per client (denoted by $C$) before allocating samples via Dirichlet distribution (with concentrate parameter $\alpha$).

The issue is from FedLab. The implementation is in convergence. You can find more details in Convergence Analysis of Sequential Federated Learning on Heterogeneous Data. [Figure: Row 1: $C=2$ with $\alpha=0.1$, $\alpha=1.0$, $\alpha=10.0$; Row 2: $C=5$ with $\alpha=0.1$, $\alpha=1.0$, $\alpha=10.0$; Row 3: $C=10$ with $\alpha=0.1$, $\alpha=1.0$, $\alpha=10.0$; ]

Li, Q., Diao, Y., Chen, Q., & He, B. (2022, May). Federated learning on non-iid data silos: An experimental study. In 2022 IEEE 38th International Conference on Data Engineering (ICDE) (pp. 965-978). IEEE.

Zhang, J., Hua, Y., Wang, H., Song, T., Xue, Z., Ma, R., & Guan, H. (2023, June). FedALA: Adaptive local aggregation for personalized federated learning. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 37, No. 9, pp. 11237-11244).

TsingZ0 commented 8 months ago

You can contribute to our project by submitting a pull request that adds the Extended Dirichlet strategy. We may add it when we have free time.

liyipeng00 commented 8 months ago

Thanks for your approval. I'm happy to contribute to this repo. Since I'm not familiar how to pull requests, it may cost some time. By the way, we find that the first implementation of Dir-Partition comes from "Bayesian nonparametric federated learning of neural networks", which could be clarified in the README.md.

liyipeng00 commented 8 months ago

\^o^/, I have added ExDir successfully. I have only added some codes, so it is safe to add this strategy to the original code.

One example: MNIST, num_clients=10, num_classes=10, C=5 and alpha=100.0

Note that here we set min_require_size_per_label = max(C * num_clients // num_classes // 5, 1), so it can be expected that there are some clients whose number of labels is 4 (less than 5). You can set it bigger to satisfy your requirements, which may increase searching time in some cases.

TsingZ0 commented 8 months ago

Nice work! We will review it several weeks later, after the CVPR deadline.

liyipeng00 commented 8 months ago

Best of luck with your CVPR paper!

TsingZ0 commented 3 months ago

Sorry for the late reply due to my busy schedule. I only have time to check PR these days. Since PFLlib has moved forward with massive changes, your original PR is unable to be directly merged. Could you please update your PR to match the latest version? Thanks for your time!

liyipeng00 commented 3 months ago

Thanks for your approval. I have updated the pull request, with Extended Dirichlet strategy added. Feel free to change the code to meet the style of PFLlib, and just call me if issues appear.

python generate_MNIST.py noniid - exdir

I would be very grateful, if you could add some statements to introduce exdir in the README.md.

One simple example

This strategy combines the popular Dirichlet-based data partition strategy with Quantity-based class imbalance.

Thanks for your approval again.

TsingZ0 commented 3 months ago

Thank you for your update, I'll check it as soon as possible.

TsingZ0 commented 3 months ago

All done, please check it.

liyipeng00 commented 3 months ago

Thanks for your patience and kindness. I have checked it and have no further problems.

TsingZ0 / PFLlib

(New data partition strategy) Extended Dirichlet strategy by combining Pathological heterogeneous setting and Practical heterogeneous setting in pFL. #139