adap / flower

Flower: A Friendly Federated Learning Framework
https://flower.ai
Apache License 2.0
4.67k stars 808 forks source link

Label and feature skew Partitioner #3146

Open WilliamLindskog opened 4 months ago

WilliamLindskog commented 4 months ago

Describe the type of feature and its functionality.

Hi there,

I've checked the documentation for datasets and open PRs and I think these partitioners would be helpful.

As in the baseline NIID-Bench, there is a partition strategy where each client gets data with a specific number of unique labels i.e. label_quantity_partitioner (only applicable for classification tasks). For such partitioner, one should be able to specify how many allotted number of labels a client is given - must be less or equal to number of unique labels in dataset.

Another partition strategy is found in the original paper - a feature distribution partition based on Gaussian Noise. Specifically, given user-defined noise level σ, we would add noises xˆ ∼ Gau(σ · i/N) for Party _Pi, where Gau(σ · i/N) is a Gaussian distribution with mean 0 and variance σ · i/N.

What do you think?

Describe step by step what files and adjustments are you planning to include.

There would be a need to create two new partitioners:

  1. Label quantity partitioner
  2. Gausian noise partitioner

And also test scripts for these.

Is there something else you want to add?

N/A

adam-narozniak commented 4 months ago

Hi @WilliamLindskog Thanks for writing the issue. We want to support both of them. Regarding the first Partitioner, I informally call it ClassConstrain Partitioner (I think some people call it pathological, but I saw that name used in a different context,t too). It was also used in other work. This will be supported shortly and is a current priority regarding the partitioning schemes. (There's even been an attempt to add it based on the implementation in the FedProx paper, though it does not generalize well; also, a heuristic was used there for the class choice, but we'll move to the purely probabilistic approach).

Regarding the second Partitioner. I'll move to that either directly after the ClassConstrain is done or have just one more quantity skew that works in a similar manner to ClassConstrain but additionally assigns a small certain number of other classes (not sure how it'll be parameterized = whether percentage or raw numbers). Which, in contrast, are completely zero in ClassConstrain.

I'll keep you updated. Also, please let me know if you have other partitioning schemes you think we should add and would like to use.