Closed JohnDN90 closed 4 years ago
I don't think that this is a general issue with Catch2 + MPI. The following seems to work fine:
File: catch_mpi_test_main.cpp
#define CATCH_CONFIG_RUNNER
#include "catch2/catch.hpp"
#include <mpi.h>
int main(int argc, char *argv[])
{
MPI_Init(&argc, &argv);
int result = Catch::Session().run(argc, argv);
MPI_Finalize();
return result;
}
File: catch_mpi_test.cpp
#include "catch2/catch.hpp"
#include <mpi.h>
TEST_CASE("reduce")
{
int size;
MPI_Comm_size(MPI_COMM_WORLD, &size);
int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
rank++;
// aritmetic series
int sum;
MPI_Allreduce(&rank, &sum, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
REQUIRE(sum == (size*(1 + size))/2 );
}
compile:
> mpicxx --std=c++17 -c catch_mpi_test_main.cpp
> mpicxx --std=c++17 catch_mpi_test.cpp catch_mpi_test_main.o -o catch_mpi_test
runs without any hanging or errors (fish shell)
for i in (seq 200)
if not mpiexec -np 2 ./catch_mpi_test > out 2>&1
echo $i failure
end
end
pbrady, thanks for testing that. It may be some incompatibility between catch2 and libmesh. I'll have to look into it more as it appears its likely not an issue with Catch2 itself. I'll close the issue.
Describe the bug Catch2 test of libMesh based program intermittently fails to complete when run with MPI.
Expected behavior Test results should be consistent and successfully pass (or failed) each time without changes to the test or program.
Reproduction steps A minimal "hello_world" example is attached, libMesh and MPI are required, you'll need to replace some of the hard-coded paths in the small
CMakeLists.txt
file.Platform information:
Note that in addition to the system above, this issue was also present in Travis CI with a macOS worker using OpenMPI and Clang compilers. This failing Travis CI test (with all build details) can be viewed here.
Additional context I am using catch2 to perform tests on a libMesh based program which uses MPI. Catch2 is setup with a user-defined
main()
function. When I runmpiexec -np 2 ./catch2_libmesh_issue
, it intermittently fails (never finishes). It doesn't provide any error message, it just hangs up and processor goes to idle. What I mean by intermittantly is that its failure is essentially at random. Sometimes it fails on the first run. Sometimes it'll run successfully five times in a row and then fail on the sixth. This issue is not isolated to one test in the actual program.Particularly to the example files attached, in once case, it run 18 times successfully and then failed. In another case, it run successfully 72 times and then failed. Using
std::cout
type statements in the example, it appears that the issue occurs in thep_global_init = new libMesh::LibMeshInit(argc, (const char **)argv)
parts of the custommain()
function for Catch2. Below are some commands I used to run the program to check for failures.for i in {1..200}; do echo "$i"; ./catch2_libmesh_issue; sleep 0.25; done
for i in {1..200}; do echo "$i"; mpiexec -np 1 ./catch2_libmesh_issue; sleep 0.25; done
for i in {1..200}; do echo "$i"; mpiexec -np 2 ./catch2_libmesh_issue; sleep 0.25; done
catch2_libmesh_issue.zip