Closed iuserea closed 4 years ago
you can press ctrl+c to see what's the error.
Have you figured out the problem?
Hi, @chaoyanghe @iuserea I also faced this problem after the first epochs. Did you solve this problem?
Thank you
Thank you, I have fixed the problem.
fedml_api/distributed/fedgkt/GKTServerTrainer.py at line 117:
epochs_server = self.args.self.args.epochs_server --> epochs_server = self.args.epochs_server
When I run the fedgkt algorithm by the following cmd. sh run_FedGKT.sh 8 cifar10 homo 10 20 1 Adam 0.001 1 0 resnet56 fedml_resnet56_homo_cifar10 "./../../../data/cifar10" 64
The processes are often suspend by some reason.I derived the result successfully for only one time.
The one I figure it our is that the connection error between the process and wandb. After solving the connection problem,there are still other potential reasons. How can I figure it out?