Closed mbanck closed 2 years ago
As far I can see, this is an error on the OpenMPI side. The message:
10: A system call failed during shared memory initialization that should
10: not have. It is likely that your MPI job will now either abort or
10: experience performance degradation.
10:
10: Local host: curie
10: System call: open(2)
10: Error: No such file or directory (errno 2)
is coming from OpenMPI. Then DBCSR stops when doing this call for creating the window. In the past we had several problems with RMA and OpenMPI, that's why we mask some versions (2.1 and 3.1), as you saw in the cmake file. We can definitely do some tricks to avoid the windows allocations when a single node is used, but in principle it must work no matter how many nodes we are using. Note that we test for test as part of our CI. I'm also assume you are already testing with previous OpenMPI version and it worked... So, question now is: could you test with two ranks? If it doesn't work, then it is something in OpenMPI or in the system you are using (according to the error message...)
A bit of "asking google", I found this post:
RMA will not used anymore with OpenMPI is involved (due to the many problems on the OpenMPI-RMA side).
Describe the bug
On Debian unstable, running the testsuite fails in
test_square_sparse_rma
:That test is the only one with
Use RMA algorithm T
, so that looks related? If that test needs two nodes (due to RMA?), I guess it should be skipped if only one is available?Hrm,
tests/CMakeLists.txt
has (around line 30):So (as Debian unstable now has OpenMPI v4.1) this should be extenden to 4.1?
To Reproduce
Run testssuite.
Expected behavior
Testsuite passes and/or skips over tests that cannot pass due to environment.
Environment:
Makefile.inc
):cmake -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_BUILD_TYPE=None -DCMAKE_INSTALL_SYSCONFDIR=/etc -DCMAKE_INSTALL_LOCALSTATEDIR=/var -DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON -DCMAKE_FIND_USE_PACKAGE_REGISTRY=OFF -DCMAKE_FIND_PACKAGE_NO_PACKAGE_REGISTRY=ON "-GUnix Makefiles" -DCMAKE_VERBOSE_MAKEFILE=ON -DCMAKE_INSTALL_LIBDIR=lib/x86_64-linux-gnu