Closed xiang-xiang-zhu closed 11 months ago
Thanks for interested in our work. Program hangs usually due to the communications between the GPUs from my experience. Have you checked your nodes supports multi-GPU training. And for mpi4py, please try install with conda, as far as I can recall, pip installed mpi4py is not a complete installation of mpi4py.
Thanks for interested in our work. Program hangs usually due to the communications between the GPUs from my experience. Have you checked your nodes supports multi-GPU training. And for mpi4py, please try install with conda, as far as I can recall, pip installed mpi4py is not a complete installation of mpi4py.
install with conda works!!! thank u
When I tried to run the training script, I was reminded that mpi4py was missing, so I installed mpi4py
Then I re-ran the training script, and there was no output at all.
I waited for a while, but the program still didn't output anything. I don't know what's wrong. My operating system is Ubuntu. Is it possible that it's an MPI problem?