KarhouTam / FL-bench

Benchmark of federated learning. Dedicated to the community. 🤗
GNU General Public License v3.0
503 stars 82 forks source link

hello, when i change 'split' to 'user', it can't run anymore #4

Closed LyaCJ closed 1 year ago

LyaCJ commented 1 year ago

thanks for your code. i have a question, it that when i change 'split' from 'sample' to 'user', it can't run. i think may be because there is no test data in client when 'split' in 'user' mode, which is caused by a bug below.

File "/sftpFile/src/client/fedavg.py", line 112, in train_and_log
    loss_before / num_samples,
ZeroDivisionError: division by zero

so my first question is , how to run in the 'user' mode?

and i have another question, i have notice that, in the 'sample' split mode, the code use 'test data' and 'train data' in the same client, does it work? because i test the acc is 8-90%, higher than the center training, if it is overfitting? and i want to know how to evaluate the acc, from the chart, or the test result?

Thank you very much!

KarhouTam commented 1 year ago

Hello, LyaCJ. Regarding your 1st question, this is because when the split is user, the whole set of clients is split into [train client, test client]. This means that there is no testset in train client and no trainset in testclient. For the bug you encountered, I will fix the code ASAP.

For your 2nd question, the trainset and testset need to be in the same distribution, but with no intersection. In my code, FL methods won't train models with test data while training.

For your confusion about accuracy, e.g., 13.65% -> 79.73%, 13.65% is the average accuracy of evaluation over sampled clients' testsets before local training, while 79.73% is the accuracy after local fine-tuning.

Note that each client's testset only contains data sampled from the same data distribution as the trainset.

e.g.

If client A's train data are from classes [1,2,3,4], the test data client A holds are also only from [1,2,3,4].

Testsets in each client are small, and because of your Non-IID data partitioning setup, each client model can fit very well into their data distribution. As a result, client models perform well on their private testset after local fine-tuning (especially if the test set is small) is reasonable.

LyaCJ commented 1 year ago

I got that, Thank you very much!!

KarhouTam commented 1 year ago

Glad to solve your issue. If you still wanna run code over user split, make sure to keep your eye on my work and pull the latest code after I fix the bug. 😏 Thanks for your attention.

LyaCJ commented 1 year ago

ok!!!

KarhouTam commented 1 year ago

Hi, @LyaCJ. I have already fixed the bug, you can pull the latest now!

LyaCJ commented 1 year ago

Wow, so nice of you!!! Thank you very much!

KarhouTam commented 1 year ago

I'm the maintainer, so that's what I need to do. 🤗