LLNL / ATS

ATS - Automated Testing System - is an open-source, Python-based tool for automating the running of tests of an application across a broad range of high performance computers.
BSD 3-Clause "New" or "Revised" License
7 stars 5 forks source link

smpiargs breaks runs on blueos #76

Closed mdavis36 closed 2 years ago

mdavis36 commented 2 years ago

Our code is running into problems running gpu tests on blueos systems. This seems to be a result of ATS throwing --smpiargs=-gpu in the jsrun command. Our tests pass when using --smpiargs=off or --smpiargs=disable_gpu_hooks however there doesn't seem to be a way to change this with ATS as far as I can see.

dawson6 commented 2 years ago

Hi mdavis36, we can provide a disable option for this (or perhaps and enable option if we change the defaults). This was necessary to support codes which relied on cuda aware mpi. But this should be selectable by the project depending on the needs. I'll work this into the next release.

Can you tell me which version of ATS you are using (ie the older python 2 or the current python 3 version)

mdavis36 commented 2 years ago

We are unfortunately still on python2.7 (hoping to transition to Python3 in the near future) and are using ATS 7.0.5

dawson6 commented 2 years ago

thank you.

dawson6 commented 2 years ago

will work on an update to the python2 branch with this option perhaps by next week will also ensure it gets into python3 branch

mdavis36 commented 2 years ago

Hi @dawson6 what is the status of this, it looks like #81 is the fix for this?

dawson6 commented 2 years ago

Hi mdavis. I have been testing David's updates to the main branch (python 3)(which would include this) and there are issues with that still, so can not make a release.

HOWEVER, as you are using the python2 branch I'll get that fix into the python 2 branch and get a release for that out. Will work on that this afternoon, update the public ATS versions and make a tarfile available in case you are installing ATS yourself. I can get that released on the CZ and the RZ, but will need someone on site this week to get the public ATS updated on the SCF. My usual person for doing that is gone this week I think, but I'll look around.

dawson6 commented 2 years ago

update for @mdavis36 am actively working on this.

dawson6 commented 2 years ago

The current python2 coding is as follows, that is it is alway on.

There are codes which do need this in order to run. But we should have way to disable it.

Hesitant to change this default, but will put in the option to disable this

Always run with --smpiargs=-gpu. So many projects use cudaMallocManaged memory

    # with MPI that it should be enabled.
    str_smpi = "--smpiargs=\"-gpu\""
dawson6 commented 2 years ago

HI @mdavis36 Could you test the 7.0.9 install on rzansel, and give it one of these two options. Either will disable the smpiargs=-gpu option and replace it with --smpiargs=off or smpiargs=-show

--smpi_off Blueos option: Add --smpiargs=off to the lrun/jsrun line. Disables --smpiargs=-gpu --smpi_show Blueos option: Add --smpiargs=show to the lrun/jsrun

mdavis36 commented 2 years ago

@dawson6 --smpi_off seems to do the trick for what we need. Thank you for getting this pushed through for us!