MRChemSoft / mrchem

MultiResolution Chemistry
GNU Lesser General Public License v3.0
28 stars 21 forks source link

Possible error in Solvent part when running in MPI #365

Closed stigrj closed 3 years ago

stigrj commented 3 years ago

The reaction_operator unit test occasionally triggers a segmentation fault, apparently in the ReactionOperator.setup(prec) function:

$ mpirun -np 4 bin/mrchem-tests

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mrchem-tests is a Catch v2.1.2 host application.
Run with -? for options

-------------------------------------------------------------------------------
ReactionOperator
-------------------------------------------------------------------------------
/home/stig/src/mrchem/tests/solventeffect/reaction_operator.cpp:55
...............................................................................

/home/stig/src/mrchem/tests/solventeffect/reaction_operator.cpp:55: FAILED:
due to a fatal error condition:
  SIGSEGV - Segmentation violation signal

===============================================================================
test cases:  20 |  19 passed | 1 failed
assertions: 564 | 563 passed | 1 failed

[drogon:147826] *** Process received signal ***
[drogon:147826] Signal: Segmentation fault (11)
[drogon:147826] Signal code:  (-6)
[drogon:147826] Failing at address: 0x3e800024172
[drogon:147826] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7fe1470363c0]
[drogon:147826] [ 1] /home/stig/src/mrchem/build-bank/lib/libmrcpp.so.1(_ZN5mrcpp12FunctionTreeILi3EE5clearEv+0x1a)[0x7fe14762d3ca]
[drogon:147826] [ 2] bin/mrchem-tests(+0x1db15d)[0x561081b5b15d]
[drogon:147826] [ 3] bin/mrchem-tests(+0x1dc406)[0x561081b5c406]
[drogon:147826] [ 4] bin/mrchem-tests(+0x18f13d)[0x561081b0f13d]
[drogon:147826] [ 5] bin/mrchem-tests(+0x16c290)[0x561081aec290]
[drogon:147826] [ 6] bin/mrchem-tests(+0x124656)[0x561081aa4656]
[drogon:147826] [ 7] bin/mrchem-tests(+0x57bc0)[0x5610819d7bc0]
[drogon:147826] [ 8] bin/mrchem-tests(+0x6c5a2)[0x5610819ec5a2]
[drogon:147826] [ 9] bin/mrchem-tests(+0x7797c)[0x5610819f797c]
[drogon:147826] [10] bin/mrchem-tests(+0x7d160)[0x5610819fd160]
[drogon:147826] [11] bin/mrchem-tests(+0x7d7b1)[0x5610819fd7b1]
[drogon:147826] [12] bin/mrchem-tests(+0x7d84f)[0x5610819fd84f]
[drogon:147826] [13] bin/mrchem-tests(+0x4bcb7)[0x5610819cbcb7]
[drogon:147826] [14] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7fe146e560b3]
[drogon:147826] [15] bin/mrchem-tests(+0x50c2e)[0x5610819d0c2e]
[drogon:147826] *** End of error message ***
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
stigrj commented 3 years ago

This issue is hopefully resolved along with all the recent MPI fixes. If the problem persists, please re-open