Open vsoch opened 1 year ago
heyo! I got everything working - let me know if you are interested in an example here: https://github.com/converged-computing/operator-experiments/tree/main/google/networking/hello-world-mpi. I think this would be important to show folks - the issue is that the install scripts just show a source command for the vars.sh (and it doesn't actually run it) and I suspect many folks will assume it is sourced and run into hours / days of anguish debugging. :laughing:
Hi!
I am trying to reproduce the simple MPI example here, because actually running an mpi program, because the example here is just running
hostname
. I have locally two examples - one an application we are working on, and the second a "hello world" example that I fell back to when I hit some issues (and it reproduced them). Here is what my job looks like:And the setup.sh and run.sh scripts
setup.sh
and run.sh
It looks like it's compiling OK - I see
hello_c
- but the error I've hit in both withmpirun
is something related to hydra and an argument?It's been really challenging figuring out how all this works - e.g., it took me a hot minute to realize that these google install commands for mpi were only available on that specific image family, and then it's taken 10+ jobs to find paths / bins of various things (I'm on my 50+ run and still don't have a working thing!) :laughing: I have a lot of feedback I'm planning to share, but would like to get at least one reasonable example working first (and I'd be happy to share)! For my execution, I'm using the python sdk so I don't have the config beyond what I posted above. Thanks for the help - looking forward to getting this working!