SMILELab-FL / FedLab

A flexible Federated Learning Framework based on PyTorch, simplifying your Federated Learning research.
https://fedlab.readthedocs.io
Apache License 2.0
743 stars 127 forks source link

PowerOfChoice算法CNN模型在CIFAR10上的实验 #323

Closed LinkWithMe closed 1 year ago

LinkWithMe commented 1 year ago

Describe the bug PowerOfChoice算法下,用CNN模型在CIFAR10上运行时,精度低了几个百分点 论文结果: image 其中,pow-d算法精度最后在57%左右 自己实验的结果: image image 其中,准确率最高为53.74%,且最初的准确率无明显变化

Environment 实验的设置尽量安装论文所描述: image 自己的实验设置: 数据集:CIFAR10 网络模型:CNN,2卷积2最大池化,4个隐藏层 [120,100,84,50] 学习率初始值:0.5 batch_size:128 本地运行几轮再上传:1 客户端总数目:100 每轮选择客户端数目:20 选择上传的客户端数:9 数据划分方式: Dirichlet非IID划分,异质指数2

训练过程:

while handler.if_stop is False:
    candidates = handler.sample_candidates()
    losses = trainer.evaluate(candidates, handler.model_parameters)

    # server side
    sampled_clients = handler.sample_clients(candidates, losses)
    broadcast = handler.downlink_package

    # client side
    trainer.local_process(broadcast, sampled_clients)
    uploads = trainer.uplink_package

    # server side
    for pack in uploads:
        handler.load(pack)

    loss, acc = evaluate(handler._model, nn.CrossEntropyLoss(), test_loader)

网络模型:

class CNN_CIFAR10(nn.Module):
    """from torch tutorial
        https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html
    """
    def __init__(self):
        super(CNN_CIFAR10,self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 100)
        self.fc3 = nn.Linear(100,84)
        self.fc4 = nn.Linear(84, 50)
        self.fc5 = nn.Linear(50, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1)  # flatten all dimensions except batch
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = F.relu(self.fc4(x))
        x = self.fc5(x)
        return x

Additional context 此外,关于本地训练的设置问题,论文MLP在FMNIST上的训练 其设置本地训练轮数为30轮 image 但是,在尽量保持其他设置相同的情况下,设置本地训练1轮再上传的效果即可与论文类似: image 自己实验的结果如下: image 是否说明论文与框架的本地训练机制有所不同?

dunzeng commented 1 year ago

联邦学习的准确率受数据划分的影响非常大,这可能是由于数据划分差异导致的和原文不一致。 其次, cifar10数据集应该还有数据预处理的差别。 据我所知,PoC似乎没有开源,所以要复现原文的实验可能得问原作者他们的实验设置细节。

FedLab没有修改Pytorch训练代码,只抽取了必要的流程用于构建可重用代码,所以应该不会影响训练精度。