Open program-- opened 8 months ago
Something needs to say target_link_library(ngen MPIDetour)
, too
Ubuntu linux tests failing due to a timeout running the tests. Downloading the logs and poking around, I see a LOT of these DEADLYSIGNAL
messages comfing from running the test_bmi_c
target
2024-03-15T22:38:01.4674964Z [100%] Linking CXX executable test_bmi_c
2024-03-15T22:38:02.1738113Z AddressSanitizer:DEADLYSIGNAL
and eventually the test times out
2024-03-15T22:38:17.3139827Z AddressSanitizer:DEADLYSIGNAL
2024-03-15T22:38:17.3140454Z CMake Error at /usr/local/share/cmake-3.28/Modules/GoogleTestAddTests.cmake:112 (message):
2024-03-15T22:38:17.3140582Z Error running test executable.
2024-03-15T22:38:17.3140590Z
2024-03-15T22:38:17.3140878Z Path: '/home/runner/work/ngen/ngen/cmake_build/test/test_bmi_c'
2024-03-15T22:38:17.3141033Z Result: Process terminated due to timeout
2024-03-15T22:38:17.3141126Z Output:
2024-03-15T22:38:17.3141215Z
2024-03-15T22:38:17.3141230Z
2024-03-15T22:38:17.3141350Z Call Stack (most recent call first):
2024-03-15T22:38:17.3141791Z /usr/local/share/cmake-3.28/Modules/GoogleTestAddTests.cmake:225 (gtest_discover_tests_impl)
2024-03-15T22:38:17.3141803Z
2024-03-15T22:38:17.3141807Z
2024-03-15T22:38:17.3142098Z gmake[3]: *** [test/CMakeFiles/test_bmi_c.dir/build.make:133: test/test_bmi_c] Error 1
2024-03-15T22:38:17.3142289Z gmake[3]: *** Deleting file 'test/test_bmi_c'
2024-03-15T22:38:17.3142567Z gmake[2]: *** [CMakeFiles/Makefile2:2163: test/CMakeFiles/test_bmi_c.dir/all] Error 2
2024-03-15T22:38:17.3142850Z gmake[1]: *** [CMakeFiles/Makefile2:2170: test/CMakeFiles/test_bmi_c.dir/rule] Error 2
2024-03-15T22:38:17.3142986Z gmake: *** [Makefile:921: test_bmi_c] Error 2
2024-03-15T22:38:17.3154227Z ##[error]Process completed with exit code 2.
2024-03-15T22:38:17.3379867Z Post job cleanup.
Ubuntu linux tests failing due to a timeout running the tests. Downloading the logs and poking around, I see a LOT of these
DEDLYSIGNAL
messages comfing from running thetest_bmi_c
target ...
@hellkite500 Yeah this is happening across PR's and the master branch, see: https://github.com/NOAA-OWP/ngen/pull/750#issuecomment-1998828125. Not sure why this is happening, but it's not reproducible locally for neither Phil nor I
Ah, I'm still catching up on a few things. Just wasn't sure if this was related to these changes or not...I looked at a few recent PR's and didn't see the failures. Might try getting an Ubuntu image spun up to test on.
Converting back to draft, since we may not need this for the time being if ESMF PR https://github.com/esmf-org/esmf/pull/234 gets merged and released quickly. It could be useful for other uncooperative libraries that insist on acting like they own MPI, though.
Where is the PMPI_Finalize coming from? I can not find documentation on It, I am guessing that it is directly called by the MPI library provided MPI_Finalize. Also is using this compatible across different MPI implementations? Ok this is the from section 15 of the MPI standard, documentation on this is not something easy to find. But It looks like this should be fine on all MPI implementations that meet the standard.
This PR resolves #748 by implementing a shim library that overrides
MPI_Finalize
calls to become no-op. Additionally, calls toMPI_Finalize
from NGen code are modified toPMPI_Finalize
to call the MPI library's actual implementation.Adding everyone as reviewers for visibility on this change.
Additions
MPIDetour
inutilities/mpi/
with the source fileMPIDetour.c
.Changes
MPI_Finalize
insrc/NGen.cpp
are converted toPMPI_Finalize
.TODO
NGen::forcing
will link to this library to handle ESMF's (and any other external) calls toMPI_Finalize
so that NGen keeps responsibility.Checklist
Target Environment support