AMReX-Codes / amrex

AMReX: Software Framework for Block Structured AMR
https://amrex-codes.github.io/amrex
Other
503 stars 336 forks source link

tests fail if only 1 GPU on compute node #3981

Closed BenWibking closed 3 weeks ago

BenWibking commented 3 weeks ago

When building with cmake .. -DAMReX_ENABLE_TESTS=ON -DAMReX_GPU_BACKEND=HIP and running ctest on a GPU development node with only 1 GPU, the following tests fail:

The following tests FAILED:
         24 - Particles_ParallelContext_3d (Failed)
         30 - Particles_Redistribute_3d (Failed)
         31 - Particles_RedistributeSOA_3d (Failed)
Errors while running CTest

This appears to be because these tests require 2 MPI ranks, and each MPI rank tries to use the GPU (the same one, since there is only 1 on this node), which fails:

    Start 24: Particles_ParallelContext_3d
1/1 Test #24: Particles_ParallelContext_3d .....***Failed    4.26 sec
Initializing AMReX (24.06-6-g0da4d8b7e657)...
MPI initialized with 2 MPI processes
MPI initialized with thread support level 0
Initializing HIP...
There are more MPI processes than the number of GPUs.!
HIP initialized with 1 device.
AMReX (24.06-6-g0da4d8b7e657) initialized
Running redistribute test
0::Assertion `do_tiling == false' failed, file "/home/bwibking/amrex/Src/Particle/AMReX_ParticleContainerI.H", line 1256 !!!
SIGABRT
1::Assertion `do_tiling == false' failed, file "/home/bwibking/amrex/Src/Particle/AMReX_ParticleContainerI.H", line 1256 !!!
SIGABRT
See Backtrace.0 file for details
See Backtrace.1 file for details

I'm not sure how best to avoid this problem. Maybe a warning could be printed if this is not expected to work in this scenario?

atmyers commented 3 weeks ago

I think the problem is actually that its using the wrong inputs file, because some logic in the cmake tests was based on CUDA being on, rather than any GPU backend. See https://github.com/AMReX-Codes/amrex/pull/3982