catchorg / Catch2

A modern, C++-native, test framework for unit-tests, TDD and BDD - using C++14, C++17 and later (C++11 support is in v2.x branch, and C++03 on the Catch1.x branch)
https://discord.gg/4CWS9zD
Boost Software License 1.0
18.77k stars 3.06k forks source link

Catch2 test never completes when using MPI in libMesh-based program #1867

Closed JohnDN90 closed 4 years ago

JohnDN90 commented 4 years ago

Describe the bug Catch2 test of libMesh based program intermittently fails to complete when run with MPI.

Expected behavior Test results should be consistent and successfully pass (or failed) each time without changes to the test or program.

Reproduction steps A minimal "hello_world" example is attached, libMesh and MPI are required, you'll need to replace some of the hard-coded paths in the small CMakeLists.txt file.

Platform information:

Note that in addition to the system above, this issue was also present in Travis CI with a macOS worker using OpenMPI and Clang compilers. This failing Travis CI test (with all build details) can be viewed here.

Additional context I am using catch2 to perform tests on a libMesh based program which uses MPI. Catch2 is setup with a user-defined main() function. When I run mpiexec -np 2 ./catch2_libmesh_issue, it intermittently fails (never finishes). It doesn't provide any error message, it just hangs up and processor goes to idle. What I mean by intermittantly is that its failure is essentially at random. Sometimes it fails on the first run. Sometimes it'll run successfully five times in a row and then fail on the sixth. This issue is not isolated to one test in the actual program.

Particularly to the example files attached, in once case, it run 18 times successfully and then failed. In another case, it run successfully 72 times and then failed. Using std::cout type statements in the example, it appears that the issue occurs in the p_global_init = new libMesh::LibMeshInit(argc, (const char **)argv) parts of the custommain() function for Catch2. Below are some commands I used to run the program to check for failures.

Command Result
for i in {1..200}; do echo "$i"; ./catch2_libmesh_issue; sleep 0.25; done Pass
for i in {1..200}; do echo "$i"; mpiexec -np 1 ./catch2_libmesh_issue; sleep 0.25; done Never Completes
for i in {1..200}; do echo "$i"; mpiexec -np 2 ./catch2_libmesh_issue; sleep 0.25; done Never Completes

catch2_libmesh_issue.zip

pbrady commented 4 years ago

I don't think that this is a general issue with Catch2 + MPI. The following seems to work fine:

File: catch_mpi_test_main.cpp

#define CATCH_CONFIG_RUNNER
#include "catch2/catch.hpp"
#include <mpi.h>

int main(int argc, char *argv[])
{
    MPI_Init(&argc, &argv);

    int result = Catch::Session().run(argc, argv);

    MPI_Finalize();

    return result;
}

File: catch_mpi_test.cpp

#include "catch2/catch.hpp"
#include <mpi.h>

TEST_CASE("reduce")
{
    int size;
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    int rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    rank++;
    // aritmetic series
    int sum;
    MPI_Allreduce(&rank, &sum, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);

    REQUIRE(sum == (size*(1 + size))/2 );
}

compile:

> mpicxx --std=c++17 -c catch_mpi_test_main.cpp
> mpicxx --std=c++17 catch_mpi_test.cpp catch_mpi_test_main.o -o catch_mpi_test

runs without any hanging or errors (fish shell)

for i in (seq 200)
    if not mpiexec -np 2 ./catch_mpi_test > out 2>&1
        echo $i failure
    end
end
JohnDN90 commented 4 years ago

pbrady, thanks for testing that. It may be some incompatibility between catch2 and libmesh. I'll have to look into it more as it appears its likely not an issue with Catch2 itself. I'll close the issue.