NCAR / spack-gust

Spack production user software stack on the Gust test system
4 stars 0 forks source link

set_gpu_rank script incompatible with PBS -V from a login node #52

Closed roryck closed 1 year ago

roryck commented 1 year ago

Reporting for a guy I know. If you use

PBS -V -l select=1:ncpus=64:ngpus=4 ...

You end up with a NGPUS=0 on the GPU nodes, and then the set_gpu_rank script bails out at:

if [[ $NGPUS -eq 0 ]]; then eecho "$my_name cannot find GPUs in compute environment" fi

Also, it looks like there are some eecho commands in there. Is that a real thing or a typo?

vanderwb commented 1 year ago

Thanks for the FYI. Yes, the eecho is a custom function and intentional.

This may be fixable, but I'm not sure. The guy you know should be cautioned that -V is problematic and should be avoided for production workflows, but I'll see if I can get it working anyways. :)

vanderwb commented 1 year ago

This bug should be fixed for jobs submitted from login nodes. There is no easy way to resolve this for jobs submitted from within another job while using -V. That should be avoided.

roryck commented 1 year ago

Thanks, I'll let my guy know.