Sandia-OpenSHMEM / SOS

Sandia OpenSHMEM is an implementation of the OpenSHMEM specification over multiple Networking APIs, including Portals 4, the Open Fabric Interface (OFI), and UCX. Please click on the Wiki tab for help with building and using SOS.
Other
58 stars 52 forks source link

Unrecognized PMI command: abort | cleaning up processes #108

Closed jeffhammond closed 8 years ago

jeffhammond commented 8 years ago
[jrhammon@esgmonster sandia-shmem]$ git clean -dfx && ./autogen.sh && ./configure --with-ofi=/opt/libfabric --with-ofi-libdir=/opt/libfabric/lib --enable-remote-virtual-addressing --prefix=/opt/shmem/sandia/intel CC=icc FC=ifort --enable-pmi-simple --enable-wrapper-rpath && make -j32 check
...
[jrhammon@esgmonster sandia-shmem]$ cat test/unit/test-suite.log
====================================================
   Sandia OpenSHMEM 1.2: test/unit/test-suite.log
====================================================

# TOTAL: 51
# PASS:  48
# SKIP:  0
# XFAIL: 0
# FAIL:  3
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

FAIL: global_exit
=================

[mpiexec@esgmonster] handle_pmi_cmd (../../pm/pmiserv/pmiserv_cb.c:77): Unrecognized PMI command: abort | cleaning up processes
[mpiexec@esgmonster] control_cb (../../pm/pmiserv/pmiserv_cb.c:958): unable to process PMI command
[mpiexec@esgmonster] HYDT_dmxu_poll_wait_for_event (../../tools/demux/demux_poll.c:76): callback returned error status
[mpiexec@esgmonster] HYD_pmci_wait_for_completion (../../pm/pmiserv/pmiserv_pmci.c:500): error waiting for event
[mpiexec@esgmonster] main (../../ui/mpich/mpiexec.c:1125): process manager error waiting for completion
FAIL global_exit (exit status: 255)

FAIL: hello_f
=============

/home/jrhammon/Work/PGAS/SHMEM/sandia-shmem/test/unit/.libs/lt-hello_f: symbol lookup error: /home/jrhammon/Work/PGAS/SHMEM/sandia-shmem/test/unit/.libs/lt-hello_f: undefined symbol: start_pes_
/home/jrhammon/Work/PGAS/SHMEM/sandia-shmem/test/unit/.libs/lt-hello_f: symbol lookup error: /home/jrhammon/Work/PGAS/SHMEM/sandia-shmem/test/unit/.libs/lt-hello_f: undefined symbol: start_pes_
FAIL hello_f (exit status: 127)

FAIL: shmem_info_f
==================

/home/jrhammon/Work/PGAS/SHMEM/sandia-shmem/test/unit/.libs/lt-shmem_info_f: symbol lookup error: /home/jrhammon/Work/PGAS/SHMEM/sandia-shmem/test/unit/.libs/lt-shmem_info_f: undefined symbol: shmem_init_
/home/jrhammon/Work/PGAS/SHMEM/sandia-shmem/test/unit/.libs/lt-shmem_info_f: symbol lookup error: /home/jrhammon/Work/PGAS/SHMEM/sandia-shmem/test/unit/.libs/lt-shmem_info_f: undefined symbol: shmem_init_
FAIL shmem_info_f (exit status: 127)
jdinan commented 8 years ago

You need a more recent version of Hydra. PMI_Abort was not implemented correctly by Hydra until somewhat recently.

jeffhammond commented 8 years ago

Well, Intel MPI needs a more recent version of Hydra and I need to use the PM that SOS installs, rather than the one in my path. I was hoping to not have to change my environment just for SOS testing.

ARMCI-MPI supports a MPIEXEC env var. Does SOS have anything like that? I grepped for it but found nothing.

jdinan commented 8 years ago

The situation is fairly obnoxious right now. You can set OSHRUN_LAUNCHER to set the launcher manually. We should probably change the script so that it looks only for mpiexec.hydra, since it's unlikely that any other launcher will work.