flux-framework / flux-pmix

flux shell plugin to bootstrap openmpi v5+
GNU Lesser General Public License v3.0
2 stars 4 forks source link

docker-run-checks fails on current main #54

Closed grondo closed 2 years ago

grondo commented 2 years ago

This is likely due to some upstream changes in dependencies, but I haven't looked any further than pasting errors here:

The default (focal) docker image fails to build due to ompi:

make[2]: Entering directory '/tmp/ompi/ompi/tools/ompi_info'
  CC       ompi_info.o
  CC       param.o
  GENERATE ompi_info.1
  CCLD     ompi_info
/usr/bin/ld: ../../../ompi/.libs/libmpi.so: undefined reference to `PMIx_Data_unload'
/usr/bin/ld: ../../../ompi/.libs/libmpi.so: undefined reference to `PMIx_Data_load'
/usr/bin/ld: ../../../ompi/.libs/libmpi.so: undefined reference to `PMIx_Get_attribute_string'
collect2: error: ld returned 1 exit status
make[2]: Leaving directory '/tmp/ompi/ompi/tools/ompi_info'
make[2]: *** [Makefile:1487: ompi_info] Error 1
make[1]: *** [Makefile:2713: all-recursive] Error 1
make[1]: Leaving directory '/tmp/ompi/ompi'
make: *** [Makefile:1479: all-recursive] Error 1
The command '/bin/sh -c cd /tmp  && git clone -b ${OMPI_BRANCH}       --recursive --depth=1 https://github.com/open-mpi/ompi  && cd ompi  && git branch  && ./autogen.pl  && ./configure --prefix=/usr       --disable-man-pages --enable-debug --enable-mem-debug       --with-pmix=external --with-libevent  && make -j $(nproc)  && sudo make install  && cd ..  && rm -rf ompi' returned a non-zero code: 2

docker-run-checks.sh: docker build failed

BTW, the following change to the Dockerfile is required to even get this far:

diff --git a/src/test/docker/focal/Dockerfile b/src/test/docker/focal/Dockerfile
index 924760b..221b207 100644
--- a/src/test/docker/focal/Dockerfile
+++ b/src/test/docker/focal/Dockerfile
@@ -14,7 +14,7 @@ RUN \
  fi

 # ompi can't coexist with mpich
-RUN sudo apt remove -yy mpich libmpich-dev libmpich12 \
+RUN sudo apt remove -yy mpich libmpich-dev \
  && sudo apt clean

 # install ompi prereqs
garlick commented 2 years ago

Yeah, I can reproduce this. Pinning down ompi to v5.0.0rc1 or v5.0.0rc2 doesn't fix it. Upgrading both openpmix and ompi to the latest tags (v4.1.1rc6 and v5.0.0rc2) doesn't seem to help either.

Possibly the problem is building ompi --with-pmix=external which circumvents its use of openpmix via submodule. I'll try that next.

garlick commented 2 years ago

It looks like the problem is that there are now openmpi v4 (and companion pmix) packages in the flux-core focal image that interfere with the ompi v5 build. I guess this is due to flux-framework/flux-core@856ac8e6e107db97e2a788c97e87b849529f83a9