Closed JiaxiangRen closed 2 years ago
您好,非常感谢提出bug。1)afl的问题来源于fmodule._model_average中使用了if not list的形式判定数组是否为空,将在一些情形下引出bug,目前已修复所有使用相同语句判定数组是否为空的代码,并重新测试了afl(注:afl默认使用全采样进行工作,指定proportion=0.1将无效,因为iterate函数中不进行采样,具体可参照afl原论文Agnostic Federated Learning);2)qfedavg使用相同命令在本地没有出现同样的bug并成功运行至结束,请问是否方便提供您本地的qfedavg.py文件
2022-05-06 06:27:13 "jzrzy" @.***> 写道:
您好,在执行afl baseline的时候出现如下报错 cmd: python main.py --task mnist_classification_cnum100_dist0_skew0_seed0 --model cnn --algorithm afl --num_rounds 2 --num_epochs 1 --learning_rate 0.215 --proportion 0.1 --batch_size 10 --eval_interval 1
在执行qffl的时候也发生报错 cmd: python main.py --task mnist_classification_cnum100_dist0_skew0_seed0 --model cnn --algorithm qfedavg --num_rounds 2 --num_epochs 1 --learning_rate 0.215 --proportion 0.1 --batch_size 10 --eval_interval 1
这也许和communicate的输出格式有关,请问如何修复这个bug? 十分感谢!
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>
您好,谢谢您的回复,qfedavg的文件如下 from .fedbase import BasicServer, BasicClient import numpy as np from utils import fmodule
class Server(BasicServer): def init(self, option, model, clients, test_data = None): super(Server, self).init(option, model, clients, test_data) self.q = option['q'] self.paras_name = ['q']
def iterate(self, t):
# sample clients
self.selected_clients = self.sample()
# training
res = self.communicate(self.selected_clients)
models, train_losses = res['model'], res['loss']
# plug in the weight updates into the gradient
grads = [(self.model- model) / self.lr for model in models]
Deltas = [gi*np.float_power(li + 1e-10, self.q) for gi,li in zip(grads,train_losses)]
# estimation of the local Lipchitz constant
hs = [self.q * np.float_power(li + 1e-10, (self.q - 1)) * (gi.norm() ** 2) + 1.0 / self.lr * np.float_power(li + 1e-10, self.q) for gi,li in zip(grads,train_losses)]
# aggregate
self.model = self.aggregate(Deltas, hs)
return
def aggregate(self, Deltas, hs):
demominator = np.sum(np.asarray(hs))
scaled_deltas = [delta/demominator for delta in Deltas]
updates = fmodule._model_sum(scaled_deltas)
new_model = self.model - updates
return new_model
class Client(BasicClient): def init(self, option, name='', train_data=None, valid_data=None): super(Client, self).init(option, name, train_data, valid_data)
def reply(self, svr_pkg):
model = self.unpack(svr_pkg)
train_loss = self.test(model, 'train')
self.train(model)
cpkg = self.pack(model, train_loss)
return cpkg
def pack(self, model, loss):
return {
"model" : model,
"loss": loss,
}
您好,您给的Client.reply函数中的第二行,跟现在的版本差异是,train_loss = self.test(model, 'train')['loss'],这是因test的返回值也被包装成dict(考虑到不同dataset的metric差异很大)。这里修改后应该可以成功运行。
2022-05-06 13:57:35 "jzrzy" @.***> 写道:
您好,谢谢您的回复,qfedavg的文件如下 from .fedbase import BasicServer, BasicClient import numpy as np from utils import fmodule
class Server(BasicServer): def init(self, option, model, clients, test_data = None): super(Server, self).init(option, model, clients, test_data) self.q = option['q'] self.paras_name = ['q']
def iterate(self, t):
# sample clients
self.selected_clients = self.sample()
# training
res = self.communicate(self.selected_clients)
models, train_losses = res['model'], res['loss']
# plug in the weight updates into the gradient
grads = [(self.model- model) / self.lr for model in models]
Deltas = [gi*np.float_power(li + 1e-10, self.q) for gi,li in zip(grads,train_losses)]
# estimation of the local Lipchitz constant
hs = [self.q * np.float_power(li + 1e-10, (self.q - 1)) * (gi.norm() ** 2) + 1.0 / self.lr * np.float_power(li + 1e-10, self.q) for gi,li in zip(grads,train_losses)]
# aggregate
self.model = self.aggregate(Deltas, hs)
return
def aggregate(self, Deltas, hs):
demominator = np.sum(np.asarray(hs))
scaled_deltas = [delta/demominator for delta in Deltas]
updates = fmodule._model_sum(scaled_deltas)
new_model = self.model - updates
return new_model
class Client(BasicClient): def init(self, option, name='', train_data=None, valid_data=None): super(Client, self).init(option, name, train_data, valid_data)
def reply(self, svr_pkg):
model = self.unpack(svr_pkg)
train_loss = self.test(model, 'train')
self.train(model)
cpkg = self.pack(model, train_loss)
return cpkg
def pack(self, model, loss):
return {
"model" : model,
"loss": loss,
}
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
谢谢您的回复,我重新clone了现在的代码,已经没有报错。 十分感谢!
您好,在执行afl baseline的时候出现如下报错 cmd: python main.py --task mnist_classification_cnum100_dist0_skew0_seed0 --model cnn --algorithm afl --num_rounds 2 --num_epochs 1 --learning_rate 0.215 --proportion 0.1 --batch_size 10 --eval_interval 1
在执行qffl的时候也发生报错 cmd: python main.py --task mnist_classification_cnum100_dist0_skew0_seed0 --model cnn --algorithm qfedavg --num_rounds 2 --num_epochs 1 --learning_rate 0.215 --proportion 0.1 --batch_size 10 --eval_interval 1
这也许和communicate的输出格式有关,请问如何修复这个bug? 十分感谢!