flux-framework / flux-pmix

flux shell plugin to bootstrap openmpi v5+
GNU Lesser General Public License v3.0
2 stars 4 forks source link

shell plugin hangs in px_destroy on TOSS3 / RHEL7 spack environment #44

Closed garlick closed 2 years ago

garlick commented 2 years ago

@grondo reports while testing flux-pmix spack package that a simple run of hostname with -o mpi=openmpi@5 hangs in px_destroy().

0.487s: flux-shell[0]: DEBUG: pmix: jobid = 628407402496
0.487s: flux-shell[0]: DEBUG: pmix: shell_rank = 0
0.487s: flux-shell[0]: DEBUG: pmix: local_nprocs = 1
0.487s: flux-shell[0]: DEBUG: pmix: total_nprocs = 1
0.487s: flux-shell[0]: DEBUG: pmix: server outsourced to OpenPMIx 4.1.0 (PMIx Standard: 4.1)
0.538s: flux-shell[0]: DEBUG: pmix: local_peers = 0
0.538s: flux-shell[0]: DEBUG: pmix: node_map = quartz1922
0.538s: flux-shell[0]: DEBUG: pmix: proc_map = 0
quartz1922
0.550s: flux-shell[0]: TRACE: pmi: 0: C: pmi EOF
0.551s: flux-shell[0]: DEBUG: task 0 complete status=0
0.562s: flux-shell[0]: DEBUG: exit 0
Thread 1 (Thread 0x2aaaaacbcfc0 (LWP 39992)):
#0  __lll_lock_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00002aaaaacdb240 in pthread_cond_broadcast@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_broadcast.S:136
#2  0x00002aaab2064afc in PMIx_Deregister_event_handler ()
   from /g/g0/grondo/git/spack/opt/spack/linux-rhel7-broadwell/gcc-10.2.1/pmix-4.1.0-uqrdendfg3uumczkqiyjfefkozjfkyzj/lib/libpmix.so.2
#3  0x00002aaaaab757f9 in notify_destroy (notify=0x5c5180) at notify.c:153
#4  0x00002aaaaab71a47 in px_destroy (px=0x4c5a20) at main.c:54
#5  0x00002aaaaab13c9b in aux_item_destroy (aux=0x4c5b80)
    at /var/tmp/grondo/spack-stage/spack-stage-flux-core-0.30.0-itp55gm7yeaia3gkhavzfm5h2evzi3hp/spack-src/src/common/libutil/aux.c:35
#6  0x00002aaaaab13f64 in aux_destroy (head=0x487ae0)
    at /var/tmp/grondo/spack-stage/spack-stage-flux-core-0.30.0-itp55gm7yeaia3gkhavzfm5h2evzi3hp/spack-src/src/common/libutil/aux.c:193
#7  0x00002aaaaaafce37 in flux_plugin_destroy (p=0x487ac0)
    at /var/tmp/grondo/spack-stage/spack-stage-flux-core-0.30.0-itp55gm7yeaia3gk

Additional facts: openpmix 4.1.0 was built with --enable-pmi-backwards-compatibility.