Closed stephanmg closed 3 years ago
I cannot reproduce this on my CentOS 7 system. I've used the system gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) as you indicated and openmpi 4.0.4 built from source with default options. I also don't have to specify a bit transport.
@mlampe thanks for testing this.
I did not build openmpi 4.0.4 on my own/from source, it was provided to me by the compute cluster as a module. I presume (as it is not the default module) it might somehow not suggested to use anyway.
However, as suggested by @bsumirak, I'll try to provide a stack trace with symbols if my time allows. This might allow us to track down the problem.
I assume by bit transport you refer to the Byte Transfer Layer BTL, which I had to modify in my mpirun
call?
The hardware might be of interest: 720x Intel Xeon E5-2690 v4 2.6GHz
Dear all,
compiling the ug4 HEAD revision (23ee853503b21d12836281b3b98e7640452780ee) and trying to run the
Laplace 3d
example in parallel yields for me this error (withopenmpi 4.0.4
):Note, that, when using Open MPI in the
v3.1
series the example runs perfectly fine, even for high number of processors.I noticed a discrepany between how I have to invoke the
mpirun
command onopenmpi-4.0.4
. In thev3.1
series I run ugshell via:mpirun -np 2 ugshell
but for thev4.0
series I need to add two additional componentsmpirun --mca btl vader,self -np 2
(vader and self). So maybe I am not using this as intended. Did anybody else experience this kind of problem?The OS is CentOS 7.6 and ug4 revision 23ee853503b21d12836281b3b98e7640452780ee.