Questions on experimental results (Table 1)

Hi, I have a question on the experimental results in Table 1, about MNIST pathological non-IID setting w/ 100 clients. It is very surprising that the accuracy of FedAvg on MNIST in this setting you reported is only 78%.

AFAIK, and according to the original FedAvg paper (McMahan et al., 2017), with the same pathological non-IID setting, the same number of clients (100), per 100 rounds, C=0.1, E=5, and similar network structures on MNIST dataset, it should be reached ~99% accuracy. (Since the performance of MNIST is easily saturated, it is reached fast to 98% just in 50 rounds) I also have reproduced the same result. (See my repo.: https://github.com/vaseline555/Federated-Averaging-PyTorch)

FYI, I found the code you used for simulating pathological non-IID setting, which is retrieved from LG-FedAvg repo. https://github.com/NVlabs/FedFomo/blob/fe04f6641407bce4fc58ea3fbf8cb314f9af8629/federated_datasets/__init__.py#L68 I've also checked it is working well (i.e., each clients has samples from two classes), so I am curious on the result you reported in the paper (too low as it is ~78%).

Could you please clarify on this? Thank you.

Best, Adam

NVlabs / FedFomo

Questions on experimental results (Table 1) #1