Add Extended Dirichlet strategy

liyipeng00 commented 4 months ago

liyipeng00 commented 4 months ago

Is this right? I am not sure what "in the PR (not in the README.md)" means.

The case C=2 and alpha=100.0.

python generate_MNIST.py noniid - exdir

The case C=2 and alpha=100.0. Note that there is one client (Client 14) has only one label. You can change min_require_size_per_label = max(C * num_clients // num_classes // 2, 1) to enforce the number of classes in each client.

More details are in https://github.com/TsingZ0/PFLlib/issues/139

Number of classes: 10

*****clientidx_map*****
{0: [6, 10, 16], 1: [7, 13, 15, 16, 19], 2: [13, 14], 3: [1, 11, 14, 18], 4: [6, 8, 17], 5: [0, 2, 12], 6: [3, 4, 5], 7: [0, 1, 2, 5, 9, 15], 8: [3, 7, 8, 10, 11, 17, 18], 9: [4, 9, 12, 19]}

*****Number of clients per label*****
[3, 5, 2, 4, 3, 3, 3, 6, 7, 4]
Client 0     Size of data: 3354  Labels:  [5 7]
         Samples of labels:  [(5, 2130), (7, 1224)]
--------------------------------------------------
Client 1     Size of data: 3629  Labels:  [3 7]
         Samples of labels:  [(3, 2470), (7, 1159)]
--------------------------------------------------
Client 2     Size of data: 3194  Labels:  [5 7]
         Samples of labels:  [(5, 2056), (7, 1138)]
--------------------------------------------------
Client 3     Size of data: 3101  Labels:  [6 8]
         Samples of labels:  [(6, 2036), (8, 1065)]
--------------------------------------------------
Client 4     Size of data: 4123  Labels:  [6 9]
         Samples of labels:  [(6, 2478), (9, 1645)]
--------------------------------------------------
Client 5     Size of data: 3660  Labels:  [6 7]
         Samples of labels:  [(6, 2362), (7, 1298)]
--------------------------------------------------
Client 6     Size of data: 4292  Labels:  [0 4]
         Samples of labels:  [(0, 2030), (4, 2262)]
--------------------------------------------------
Client 7     Size of data: 2581  Labels:  [1 8]
         Samples of labels:  [(1, 1574), (8, 1007)]
--------------------------------------------------
Client 8     Size of data: 3054  Labels:  [4 8]
         Samples of labels:  [(4, 2149), (8, 905)]
--------------------------------------------------
Client 9     Size of data: 3247  Labels:  [7 9]
         Samples of labels:  [(7, 1377), (9, 1870)]
--------------------------------------------------
Client 10    Size of data: 3411  Labels:  [0 8]
         Samples of labels:  [(0, 2476), (8, 935)]
--------------------------------------------------
Client 11    Size of data: 3777  Labels:  [3 8]
         Samples of labels:  [(3, 2762), (8, 1015)]
--------------------------------------------------
Client 12    Size of data: 3623  Labels:  [5 9]
         Samples of labels:  [(5, 2127), (9, 1496)]
--------------------------------------------------
Client 13    Size of data: 4703  Labels:  [1 2]
         Samples of labels:  [(1, 1635), (2, 3068)]
--------------------------------------------------
Client 14    Size of data: 3922  Labels:  [2]
         Samples of labels:  [(2, 3922)]
--------------------------------------------------
Client 15    Size of data: 2558  Labels:  [1 7]
         Samples of labels:  [(1, 1461), (7, 1097)]
--------------------------------------------------
Client 16    Size of data: 4181  Labels:  [0 1]
         Samples of labels:  [(0, 2397), (1, 1784)]
--------------------------------------------------
Client 17    Size of data: 3371  Labels:  [4 8]
         Samples of labels:  [(4, 2413), (8, 958)]
--------------------------------------------------
Client 18    Size of data: 2849  Labels:  [3 8]
         Samples of labels:  [(3, 1909), (8, 940)]
--------------------------------------------------
Client 19    Size of data: 3370  Labels:  [1 9]
         Samples of labels:  [(1, 1423), (9, 1947)]
--------------------------------------------------
Total number of samples: 70000
The number of train samples: [2515, 2721, 2395, 2325, 3092, 2745, 3219, 1935, 2290, 2435, 2558, 2832, 2717, 3527, 2941, 1918, 3135, 2528, 2136, 2527]
The number of test samples: [839, 908, 799, 776, 1031, 915, 1073, 646, 764, 812, 853, 945, 906, 1176, 981, 640, 1046, 843, 713, 843]

Saving to disk.

Finish generating dataset.

TsingZ0 commented 4 months ago

Thank you for the detailed example. To maintain clarity in the README.md, it is suggested to include all possible examples here for easy checking.

liyipeng00 commented 4 months ago

Thanks for your approval and patience. The visualization of more examples are in https://github.com/TsingZ0/PFLlib/issues/139. By tuning some parameters, we can generate more partition maps.

Be free to call me if issues are raised (^o^)/.

TsingZ0 / PFLlib

Add Extended Dirichlet strategy #185