The distributed experiment was stuck after creating model done

csshali commented 2 years ago

I ran run_fedavg_distributed_pytorch but the experiment was stuck after creating the model done. What's wrong?

2022-04-10,23:37:38.903 - {data_loader.py (453)} - load_partition_data(): Client idx = 0, local sample number = 191 2022-04-10,23:37:38.903 - {data_loader.py (453)} - load_partition_data(): Client idx = 1, local sample number = 190 2022-04-10,23:37:38.903 - {data_loader.py (453)} - load_partition_data(): Client idx = 2, local sample number = 190 2022-04-10,23:37:38.903 - {data_loader.py (453)} - load_partition_data(): Client idx = 3, local sample number = 190 2022-04-10,23:37:38.903 - {data_loader.py (453)} - load_partition_data(): Client idx = 4, local sample number = 190 2022-04-10,23:37:38.903 - {data_loader.py (453)} - load_partition_data(): Client idx = 5, local sample number = 190 2022-04-10,23:37:38.904 - {main_fedavg.py (139)} - create_model(): create_model. model_name = graphsage, output_dim = None 2022-04-10,23:37:38.929 - {main_fedavg.py (180)} - create_model(): done

Lin-repository commented 2 years ago

same question

chaoyanghe commented 2 years ago

@csshali @csshali Thank you for the feedback. I've fixed this issue. Please try to update the latest source code.:

https://github.com/FedML-AI/FedGraphNN/commit/5b3c766baa48efdde7aec658eaabb6d0eb19386f

Lin-repository commented 2 years ago

finally found the reason, it is stuck caused by wandb not being configured，thanks to the questioner and author

FedML-AI / FedGraphNN

The distributed experiment was stuck after creating model done #12