Closed Flionay closed 3 months ago
I wanted to follow up on this issue. Upon further investigation, I realized that the problem was not with the project code but with my local environment. Therefore, I am closing this issue.
For anyone encountering similar issues, I found the cause and solution related to the environment in this discussion: NVIDIA/nccl#976.
Thank you for your time and support.
Version
0.5.0
On which installation method(s) does this occur?
Docker
Describe the issue
When I run the GraphCast model with
mpirun --allow-run-as-root -np 3 python train_graphcast.py
, I encounter an error. However, when I usempirun --allow-run-as-root -np 2 python train_graphcast.py
, the model runs without any issues.I am seeking help to identify the potential cause of this problem. Below is the output log from my program:
Minimum reproducible example
Relevant log output
Environment details
No response