bsc-pm / nanox

Nanos++ is a runtime designed to serve as runtime support in parallel environments. It is mainly used to support OmpSs, a extension to OpenMP developed at BSC.
https://pm.bsc.es/nanox
GNU Lesser General Public License v3.0
38 stars 15 forks source link

nanox cluster runtime error #13

Closed AminSahebi closed 3 years ago

AminSahebi commented 3 years ago

Hi there, I'm using ompss on ARM-based cluster (udoo boards) and everything went well (using the AXIOM Project materials). Once I'm trying to port the same practice on a different cluster board which is X86_64 based, I faced a runtime error which is below:

Executing matrix multiplication on 8 boards... WARNING: [?]plugin error=/home/udoo/nanox-install/lib/performance/libnanox-pe-cluster-mpi.so: undefined symbol: ompi_mpi_op_sum terminate called after throwing an instance of 'nanos::FatalError' what(): FATAL ERROR: [-1] Couldn't load Cluster support matmul/matmul.sh: line 7: 28627 Aborted (core dumped) ~/matmul/dgemm_onelevel.perf Primary job terminated normally, but 1 process returned a non-zero exit code.. Per user-direction, the job has been aborted. mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was:

Process name: [[62796,1],0] Exit code: 134

I guess there is something related to openmpi configuration flags, but I'm not sure, below is the flags that I used to compile the repositories and the specification of the platform I have used : Platform: ubuntu 16.04 (and also ubuntu 18.04 the same problem) First installing openmpi: (I have tried both procedure and no success) first try: sudo apt install openmpi-bin openmpi-bin openmpi-dev

second: cd ~/openmpi-1.10.2/./configure --enable-mpi-threads -> make- make install

then:

./configure --prefix=/home/$USER/gasnet-install --disable-aligned-segments --disable-pshm --disable-seq --disable-parsync --with-mpi-cc="mpicc -fPIC -DPIC" --with-mpi-cxx="mpicxx -fPIC -DPIC" CC="gcc -fPIC -DPIC" CFLAGS="-fPIC -DPIC" CXX="g++ -fPIC -DPIC" CXXFLAGS="-fPIC -DPIC" CPPFLAGS="-DPIC" LDFLAGS="-fPIC" --enable-mpi --enable-udp --enable-smp --disable-ibv make make install then:

./configure --prefix=/home/$USER/nanox-install --with-gasnet=/home/$USER/gasnet-install --disable-debug --with-mpi-include=/usr/include/mpi --with-mpi-lib=/usr/lib MPICXX=mpicxx make make install then: configure --prefix=/home/$USER/mcxx-install --enable-ompss --enable-tl-openmp-nanox --with-nanox=/home/$USER/nanox-install make make install I exported the PATH, like PATH=/home/udoo/nanox-install/bin:/home/udoo/gasnet-install/bin:/home/udoo/mcxx-install/bin:$PATH and exported the LD_LIBRARY_PATH, I've checked the openmpi linked libraries, shown below:

$mpicxx -show $g++ -I/usr/local/include -pthread -Wl,-rpath -Wl,/usr/local/lib -Wl,--enable-new-dtags -L/usr/local/lib -lmpi_cxx -lmpi

$mpicc -show $gcc -I/usr/local/include -pthread -Wl,-rpath -Wl,/usr/local/lib -Wl,--enable-new-dtags -L/usr/local/lib -lmpi $ ldd /usr/bin/mpicc.openmpi linux-vdso.so.1 => (0x00007ffdf031f000) libopen-pal.so.13 => /usr/local/lib/libopen-pal.so.13 (0x00007f4351616000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f43513f9000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f435102f000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f4350e2b000) librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f4350c23000) libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f4350a20000) /lib64/ld-linux-x86-64.so.2 (0x00007f43518f9000)

after all, tried to run MatrixMultiplication provided by BSC, I faced the runtime error as can be seen above,

I really appreciate any help.

thanks