Closed iuserea closed 4 years ago
@iuserea Hi, I think your local code is in an older version. Please update to the latest version. I just double-checked, it works now.
@iuserea check the latest code please.
There're still wrong when running under standalone environment with using following command.
nohup sh run_fedavg_standalone_pytorch.sh 2 10 64 cifar10 ./../../../data/cifar10 resnet56 homo 200 20 0.001 > ./fedavg_standalone.txt 2>&1 &
@iuserea
We suggest using the distributed computing when training large DNN like ResNet since the standalone version is very time-consuming. So we remove the model initialization at create_model() - main_fedavg.py in previous version.
Now I added back large DNN models for standalone. You can choose any as you like if you can accept a very long training time...
@chaoyanghe Thank you for your kindness for interpretation.I'll have a try at least once.
When I use commands as below which included in readme.md, ( nohup sh run_fedavg_standalone_pytorch.sh 2 10 64 cifar10 ./../../../data/cifar10 resnet56 homo 200 20 0.001 > ./fedavg_standalone.txt 2>&1 nohup sh run_fedavg_standalone_pytorch.sh 2 10 10 mnist ./../../../data/mnist lr hetero 200 20 0.03 > ./fedavg_standalone.txt 2>&1 &)
The same error occurs: Traceback (most recent call last): File "./main_fedavg.py", line 160, in
trainer = FedAvgTrainer(dataset, model, device, args)
File "/home/xx/proj/Source/FedML/fedml_api/standalone/fedavg/fedavg_trainer.py", line 22, in init
self.model_global.train()
AttributeError: 'NoneType' object has no attribute 'train'.
For my poor understanding of PyTorch,can anyone teach me where the code is written not so well?