KarhouTam / FL-bench

Benchmark of federated learning. Dedicated to the community. 🤗
GNU General Public License v3.0
505 stars 82 forks source link

Bug report of FedMD code #24

Closed suololololo closed 1 year ago

suololololo commented 1 year ago

the code of FedMD algorithmn not work. When I run the command:

cd src/server 
python fedmd.py

An error has occurred with the following error message:

Traceback (most recent call last): File "/home/cjj/gitproject/FL-bench/src/server/fedmd.py", line 78, in server = FedMDServer() File "/home/cjj/gitproject/FL-bench/src/server/fedmd.py", line 39, in init self.trainer = FedMDClient( File "/home/cjj/gitproject/FL-bench/src/client/fedmd.py", line 24, in init self.public_dataset = DATASETS[self.args.public_dataset]( TypeError: MNIST.init() got an unexpected keyword argument 'transform'

I think the bug maybe in the file src/client/fedmd.py of function FedMDClient.init()

My environment

Python 3.10.10

Experiment Arguments:

{
'model': 'lenet5',
'dataset': 'cifar10',
'seed': 42,
'join_ratio': 0.1,
'global_epoch': 100,
'local_epoch': 5,
'finetune_epoch': 0,
'test_gap': 100,
'eval_test': 1,
'eval_train': 0,
'local_lr': 0.01,
'momentum': 0.0,
'weight_decay': 0.0,
'verbose_gap': 100000,
'batch_size': 32,
'visible': 0,
'global_testset': 0,
'straggler_ratio': 0,
'straggler_min_local_epoch': 1,
'use_cuda': 1,
'save_log': 1,
'save_model': 0,
'save_fig': 1,
'save_metrics': 1,
'digest_epoch': 1,
'public_dataset': 'mnist',
'public_batch_size': 32,
'public_batch_num': 5,
'dataset_args': {'dataset': 'cifar10', 'client_num': 100, 'fraction': 0.5, 'seed': 42, 'split': 'sample', 'alpha': 0.1, 'least_samples': 40}
}

KarhouTam commented 1 year ago

Yep. I forgot to update the client/fedmd.py when published the new feature about data transformation. I will fix the code now. Thank you for pointing it out.

KarhouTam commented 1 year ago

https://github.com/KarhouTam/FL-bench/commit/74b14f48e0be39f3f2a14f55650f245c33dec29e

FedMD should work properly now.

Please pull the latest code and try again.

suololololo commented 1 year ago

Thank you for your bug fix! But after I pull the latest code and try python fedmd.py command again. An error has occurred with the following error message: Traceback (most recent call last): File "/home/cjj/gitproject/FL-bench/src/server/fedmd.py", line 79, in server.run() File "/home/cjj/gitproject/FL-bench/src/server/fedavg.py", line 449, in run self.train() File "/home/cjj/gitproject/FL-bench/src/server/fedavg.py", line 217, in train self.train_one_round() File "/home/cjj/gitproject/FL-bench/src/server/fedmd.py", line 48, in train_one_round scores_cache.append(self.trainer.get_scores(client_id, client_params)) File "/home/cjj/anaconda3/envs/fl-bench/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, kwargs) File "/home/cjj/gitproject/FL-bench/src/client/fedmd.py", line 77, in get_scores return [self.model(x).clone() for x in self.public_data] File "/home/cjj/gitproject/FL-bench/src/client/fedmd.py", line 77, in return [self.model(x).clone() for x in self.public_data] File "/home/cjj/anaconda3/envs/fl-bench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/home/cjj/gitproject/FL-bench/src/config/models.py", line 49, in forward return self.classifier(F.relu(self.base(x))) File "/home/cjj/anaconda3/envs/fl-bench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "/home/cjj/anaconda3/envs/fl-bench/lib/python3.10/site-packages/torch/nn/modules/container.py", line 204, in forward input = module(input) File "/home/cjj/anaconda3/envs/fl-bench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/home/cjj/anaconda3/envs/fl-bench/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward return self._conv_forward(input, self.weight, self.bias) File "/home/cjj/anaconda3/envs/fl-bench/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward return F.conv2d(input, weight, bias, self.stride, RuntimeError: Given groups=1, weight of size [32, 3, 5, 5], expected input[32, 1, 28, 28] to have 3 channels, but got 1 channels instead

Maybe the default arg of model choice is not suitable. Or what command do I need to execute?

KarhouTam commented 1 year ago

Since FedMD's paper only show two cases of [public dataset, private dataset]. The default value of arg --public_dataset of FedMD is mnist, but the default arg of --dataset (the private one) is cifar10, which is not compatible.

If you wanna run the [mnist, femnist/emnist], you can run command like:

python fedmd.py -d emnist

For [cifar10, cifar100]:

python fedmd.py --public_dataset cifar10 -d cifar100
suololololo commented 1 year ago

Thank you for your answers!