OpenMPI's smcuda btl selected on non-GPU nodes?

migueldiascosta commented 1 year ago

With the OpenMPI included in version foss/2022a, I'm seeing the smcuda btl being selected on non-GPU nodes, with an adverse impact on performance compared to vader

I initially saw this when running perf top, but it can also be checked by e.g. passing -mca btl_base_verbose 100 to mpirun (or setting the corresponding environment variable) and looking for "Using smcuda"

Does anyone else see this behaviour?

Edit: seen this on a few different non-GPU systems now - simplest way to check is probably to load HPL and, with the included HPL.dat file, run mpirun -np 4 --mca btl_base_verbose 100 xhpl 2>&1 | grep Using

Edit 2: I'm working around the issue by setting the environment variable OMPI_MCA_btl=^smcuda in non-GPU nodes, but if this happens to other people we need a better solution

boegel commented 1 year ago

@Micket @bartoldeman thoughts on this?

Micket commented 1 year ago

@migueldiascosta there is no included HPL.dat here i think?

migueldiascosta commented 1 year ago

@Micket I meant the HPL.dat included in the HPL installation, but it doesn't really matter, I just mentioned HPL as a ready to use MPI program that could be used to debug the btl selection with -mca btl_base_verbose 100 (and not as a benchmark)

Now, one reason this may not be so widespread is that the btl's are only used with the ob1 pml, they are bypassed completely when using the ucx pml (e.g., slide 33 and onward of https://www.open-mpi.org/video/general/easybuild_tech_talks_01_OpenMPI_part2_20200708.pdf, these "easybuild tech talks" are really useful :) ), so in order to check if smcuda is being selected when ob1 is used it may be necessary to also pass -mca pml ob1.

When the ob1 pml is used and btl_base_verbose is set and I use foss >= 2021a, I see

mca: bml: Using smcuda btl for send to ... on node ...

but if I use foss/2020b, I see

mca: bml: Using vader btl for send to ... on node ...

so it does seem this is a side-effect of building with --with-cuda=internal (?)

boegel commented 5 months ago

@bartoldeman Any thoughts on this? Should we try and prevent that smcuda is used on non-GPU systems?

easybuilders / easybuild-easyconfigs

OpenMPI's smcuda btl selected on non-GPU nodes? #17854