TRIQS / triqs

a Toolbox for Research on Interacting Quantum Systems
https://triqs.github.io
GNU General Public License v3.0
141 stars 72 forks source link

Hanging gf_empty test #795

Closed AlynJ closed 4 years ago

AlynJ commented 4 years ago

Hello TRIQS developers,

For TRIQS version 3.0, I’ve come across an issue with the gf_empty test when executing the "make test" command on my cluster. This test hangs for a long time, far longer than when I installed TRIQS-3.0 on my local machine. This test hangs on the (cluster) login node and as a submitted job.

When I ran the gf_empty executable separately, I received the following message:

[WARNING] … /triqs-3.0/triqs.build/deps/GTest_src/googletest/src/gtest-death-test.cc:1122:: Death tests use fork(), which is unsafe particularly in a threaded context. For this test, Google Test detected 3 threads. See https://github.com/google/googletest/blob/master/googletest/docs/advanced.md#death-tests-and-threads for more explanation and suggested solutions, especially if this is the last message you see before your test times out.

A process has executed an operation involving a call to the "fork()" system call to create a child process.  Open MPI is currently operating in a condition that could result in memory corruption or other system errors; your job may hang, crash, or produce silent data corruption.  The use of fork() (or system() or other calls that create child processes) is strongly discouraged.

The process that invoked fork was:

  Local host:          [[43651,1],0] (PID 16453)

If you are absolutely sure that your application will successfully and correctly survive a call to fork(), you may disable this warning by setting the mpi_warn_on_fork MCA parameter to 0.

Then the test hangs for a significant period of time (which I had to force quit every time). I was able to fix this issue on my cluster installation by adding the following line to the gf_empty.cpp code:

::testing::FLAGS_gtest_death_test_style="threadsafe";

(This line of code was from the website given in the warning message). However, when I tried to install the TRIQS library with my updated gf_empty.cpp test on my local machine, it hangs instead!

This doesn’t seem to be an issue with the TRIQS code, but more with the gtest and its threading usage. So, the purpose of this GitHub issue is to hopefully fix this gtest.

My Cluster build specifications are given below:

• GNU compilers version: 7.2.0
• Openmpi version: 3.0.0
• HDF5 version: 1.10.1
• Boost version: 1.67.0
• Python version: 3.9.0
• NumPy version: 1.19.2
• fftw version:  3.3.8
• cmake version: 3.13.2
• git version: 1.8.3.1

Please let me know if you need any further information.

Kind regards,

Alyn

P.S. Below is the output from the cmake command if this helps (I’ve redacted the pathways):

-- Installation directory will be .../triqs-3.0/test/test -- The C compiler identification is GNU 7.2.0 -- The CXX compiler identification is GNU 7.2.0 -- Check for working C compiler: .../GCCcore/7.2.0/bin/gcc -- Check for working C compiler: .../GCCcore/7.2.0/bin/gcc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Detecting C compile features -- Detecting C compile features - done -- Check for working CXX compiler: .../GCCcore/7.2.0/bin/g++ -- Check for working CXX compiler: .../GCCcore/7.2.0/bin/g++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- -------- Checking compiler C++ status------------- -- Build type: Release -- -------- triqs version and git hash detection ------------- -- Found Git: /usr/bin/git (found version "1.8.3.1") -- Triqs version : 3.0.0 -- Git hash : 79a979f31a007b77bf26ada10712a1acccb9640a -- Hostname : bc4login2.bc4.acrc.priv -- Compiled by : aj12959 -- =============== Configuring Dependency Cpp2Py =============== Cloning into '.../triqs-3.0/triqs.build/deps/Cpp2Py_src'... -- -------- BUILD-TYPE: Release ------------- -- Installation directory will be .../triqs-3.0/test/test -- -------- cpp2py version and git hash detection ------------- -- Cpp2py version : 2.0.0 -- Git hash : 591dc975c9acddb1ac2938bcd7b48fba46166e29 -- -------- LibClang detection ------------- -- Can not find the Clang compiler, hence can not find the option for libclang -- Could NOT find LibClang (missing: LIBCLANG_LOCATION LIBCLANG_CXX_FLAGS) -- LibClang location: LIBCLANG_LOCATION-NOTFOUND -- LibClang additional flags: -- -------- Python detection ------------- -- Python interpreter .../triqs-3-deps/bin/python3 -- Python interpreter and modules are ok : version 3.9.0 -- PYTHON_INCLUDE_DIRS = .../triqs-3-deps/include/python3.9 -- PYTHON_NUMPY_INCLUDE_DIR = .../triqs-3-deps/lib/python3.9/site-packages/numpy/core/include -- PYTHON_NUMPY_VERSION = 1.19.2 -- PYTHON_SITE_PKG = .../triqs-3-deps/lib/python3.9/site-packages -- PYTHON_LIBRARY = .../triqs-3-deps/lib/libpython3.9.so -- PYTHON_EXTRA_LIBS =-lcrypt -lpthread -ldl -lutil -lm -- Python modules will be installed in .../triqs-3.0/test/test/lib/python3.9/site-packages -- ** -- ** WARNING **** --
-- Can not find libclang
-- You can use cpp2py to compile a code, but c++2py, c++2rst, c++2cxx will not work --
-- ** WARNING **** -- ** -- =============== End Cpp2Py Configuration =============== -- =============== Configuring Dependency GTest =============== Cloning into '.../triqs-3.0/triqs.build/deps/GTest_src'... -- Found PythonInterp: .../triqs-3-deps/bin/python3 (found version "3.9") -- Looking for pthread.h -- Looking for pthread.h - found -- Looking for pthread_create -- Looking for pthread_create - not found -- Looking for pthread_create in pthreads -- Looking for pthread_create in pthreads - not found -- Looking for pthread_create in pthread -- Looking for pthread_create in pthread - found -- Found Threads: TRUE
-- =============== End GTest Configuration =============== -- =============== Configuring Dependency itertools =============== Cloning into '.../triqs-3.0/triqs.build/deps/itertools_src'... -- itertools version : 1.0.0 -- itertools Git hash: 56b8d791c9817d09b9d9d1f3b26e982b1da30705 -- Dependency GTest was already resolved. -- Performing Test HAS_CPP20 -- Performing Test HAS_CPP20 - Failed -- =============== End itertools Configuration =============== -- =============== Configuring Dependency mpi =============== Cloning into '.../triqs-3.0/triqs.build/deps/mpi_src'... -- mpi version : 1.0.0 -- mpi Git hash: 8572d7bed1ce76b5df95d2d661510f24be97ab43 -- Dependency GTest was already resolved. -- Dependency itertools was already resolved. -- -------- MPI detection ------------- -- Found MPI_CXX: .../OpenMPI/3.0.0-GCC-7.2.0-2.29/lib/libmpi.so (found version "3.1") -- Found MPI: TRUE (found version "3.1") found components: CXX -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: CXX -- =============== End mpi Configuration =============== -- =============== Configuring Dependency h5 =============== Cloning into '.../triqs-3.0/triqs.build/deps/h5_src'... -- h5 version : 1.0.0 -- h5 Git hash: af9681a67dcbc7121b363e43f60b5e2cbad73dcb -- Dependency Cpp2Py was already resolved. -- Dependency GTest was already resolved. -- -------- HDF5 detection ------------- -- HDF5: Using hdf5 compiler wrapper to determine C configuration -- Found HDF5: .../triqs-3-deps/lib/libhdf5.so;/usr/lib64/libz.so;/usr/lib64/libdl.so;/usr/lib64/libm.so (found version "1.10.1") found components: C HL -- Adding cpp2py Python module storable -- Adding cpp2py Python module h5py -- =============== End h5 Configuration =============== -- -------- Lapack detection ------------- -- Looking for sgemm -- Looking for sgemm - not found -- Looking for sgemm -- Looking for sgemm - found -- Found BLAS: /usr/lib64/libopenblas.so
-- Looking for cheev
-- Looking for cheev_ - found -- A library with LAPACK API found. -- -------- Boost detection ------------- -- Boost version: 1.67.0 -- Boost include dir: .../triqs-3-deps/include -- -------- GMP detection ------------- -- Found GMP: /usr/lib64/libgmp.so
-- -------- FFTW detection ------------- -- Found FFTW: .../triqs-3-deps/lib/libfftw3.so
-- -------- Misc ------------- -- Checked max_align_t. No workaround needed -- -------- Preparing python extension modules ------------- -- Adding cpp2py Python module block_matrix -- Adding cpp2py Python module meshes -- Adding cpp2py Python module gf_fnt -- Adding cpp2py Python module gf_factories -- Adding cpp2py Python module wrapped_aux -- Adding cpp2py Python module lattice_tools -- Adding cpp2py Python module operators -- Adding cpp2py Python module extractors -- Adding cpp2py Python module random_generator -- Adding cpp2py Python module histograms -- Adding cpp2py Python module atom_diag -- -------- Preparing tests ------------- -- Adding cpp2py Python module my_module -- Adding cpp2py Python module my_moduleB -- Adding cpp2py Python module test_g -- Adding cpp2py Python module test_bl -- Adding cpp2py Python module test_multivar -- -------- Making TRIQSConfig.cmake ------------- -- -- Use :
-- source .../triqs-3.0/test/test/share/triqsvars.sh
--
to set up the environment variables
--
--
Consider copying .../triqs-3.0/triqs.build/triqs.modulefile -- into your environment module directories
--
-- Configuring done -- Generating done -- Build files have been written to: .../triqs-3.0/triqs.build

Wentzell commented 4 years ago

Dear @AlynJ,

Thank you for pointing this out! This issue should be resolved with commit c89155c4 Could you try to build and test the TRIQS unstable branch on the same cluster and let me know if the issue persists for you? If this solves the problem for you I can replay the commit also on 3.0.x

Best,

Nils

AlynJ commented 4 years ago

Hi Nils,

I've tested gf_empty on my local machine and the cluster and the test works for both!

I had to copy the gf_empty test into the stable version of 3.0 as building the unstable version crashed on both of my setups - I'll put the error message at the bottom. I thought I should make you aware of this!

Best,

Alyn

unstable 3.0.x error message:

[ 71%] Built target gf_2times Scanning dependencies of target operator_test In file included from .../triqs-3.0/triqs-unstable.src/test/c++/mc_tools/different_moves_mc.cpp:32:0: .../triqs-3.0/triqs-unstable.src/c++/triqs/mc_tools/mc_generic.hpp:31:10: fatal error: mpi/monitor.hpp: No such file or directory

include <mpi/monitor.hpp>

      ^~~~~~~~~~~~~~~~~

compilation terminated. make[2]: [test/c++/mc_tools/CMakeFiles/different_moves_mc.dir/different_moves_mc.cpp.o] Error 1 make[1]: [test/c++/mc_tools/CMakeFiles/different_moves_mc.dir/all] Error 2 make[1]: *** Waiting for unfinished jobs....

Wentzell commented 4 years ago

Hi Alyn,

Good to hear that the first issue is resolved!

Regarding the second issue, could you retry with the latest unstable? I believe this should be resolved with https://github.com/TRIQS/triqs/commit/d680c8fcda82cb50969730d2fa5746fb2087238a

AlynJ commented 4 years ago

Hi Nils,

I'm able to compile the latest unstable branch with no issues.

Best,

Alyn

Wentzell commented 4 years ago

Hey Alyn,

Thank you for the update! I will close this issue as everything was resolved.