Closed jedwards4b closed 1 week ago
I've deployed a fix using mktemp
and it is available on Derecho.
Will push the git commit here shortly.
Thanks Ben for working on this issue. According to Francis, the updated mpibind
script now works well for multiple jobs that run simultaneously on Derecho. However, there is a failure coming from a job that uses only 36 CPU cores per node. Is this failure related to the mpibind
script or other system configurations?
Hi Jian,
Thanks for letting us know that Ben's fix worked for the multiple jobs issue.
On the issue with 36 CPU core job failing - this is sort of a PBS cgroup / system issue. When you select ncpus < 128, PBS creates a cgroup, and doesn't do so in a way that's balanced across sockets, so it will over subscribe one CPU and artificially limit your memory bandwidth. So, all PBS jobs should set ncpus=128, even if they are using less than 128 cores. The mpibind scripts will bind correctly and balance across sockets in this scenario. I had argues for ncpus=128 being the default and not having to set it in a PBS job at all, but was outvoted on that issue. I could have mpibind error out with a message when it detects ncpus < 128, but that's about all I could do in the wrapper. Would that be helpful?
Hi Rory,
Yes I think that you should have it error out - I'll need to make a change in cime to get cases run on less that a node to set up this way.
Hi Rory and Jim,
Thanks for your quick and detailed replies. That makes sense to me.
If Jim is going to make changes in CIME to get a job with < 128 CPU cores work on Derecho, shall we let mpibind
issue a warning rather than an error so that the simulation can proceed? And I guess what we want to do is always setting ncpus=128 for PBS resources but passing the actual requested number of CPU cores to the mpiexec
command through mpibind
?
Done
The mpibind log written here and here.. Needs to include the PBS_JOBID so that it is unique when a user is running multiple jobs, further the rm command here is too general.
Another solution might be to create a subdirectory of TMPDIR based on jobid.