LLNL / scr

SCR caches checkpoint data in storage on the compute nodes of a Linux cluster to provide a fast, scalable checkpoint / restart capability for MPI codes.
http://computing.llnl.gov/projects/scalable-checkpoint-restart-for-mpi
Other
99 stars 36 forks source link

bootstrap.sh fails to compile with clang/6.0.0 on quartz #436

Open mcfadden8 opened 3 years ago

mcfadden8 commented 3 years ago

While clang/6.0.0 fails, switching to gcc works fine.

Steps to reproduce:

  1. ssh quartz
  2. ml clang/6.0.0
  3. git clone git@github.com:LLNL/scr.git && cd scr
  4. ./bootstrap.sh

This results in the following build failure:

[ 77%] Building C object src/CMakeFiles/redset_o.dir/redset_reedsolomon_common.c.o
/usr/workspace/martymcf/src/scr/deps/redset/src/redset_reedsolomon_common.c:13:24: fatal error: kvtree_mpi.h: No such file or directory
#include "kvtree_mpi.h"
^
/usr/workspace/martymcf/src/scr/deps/redset/src/redset_reedsolomon.c:13:24: fatal error: kvtree_mpi.h: No such file or directory
#include "kvtree_mpi.h"
^
compilation terminated.

Looking through the output, it appears that MPI_C is found, but MPI_CXX is not.

-- Found MPI_C: /usr/tce/packages/mvapich2/mvapich2-2.3-clang-6.0.0/lib/libmpi.so (found version "3.1")
-- Could NOT find MPI_CXX (missing: MPI_CXX_WORKS)
-- Could NOT find MPI (missing: MPI_CXX_FOUND) (found version "3.1")
gonsie commented 3 years ago

Added context: although MPI_C is found, MPI_CXX is not … which means that MPI is turned off entirely for the build of the kvtree component, which later impacts the redset component.

gonsie commented 3 years ago

With the ecp-veloc/redset@409b347eb5697fbed395026e0fb1ac605c4ec971 by @adammoody, this now passes. Is there any more work we want to do here?

adammoody commented 3 years ago

We have two separate issues related to this.

The first issue was a request to change things to require MPI to be found during the configure step. We have now merged a fix for that, so I just closed that issue: https://github.com/LLNL/scr/issues/437.

This issue was created to figure out why the clang/6.0.0 build was failing to find MPI in the first place. At the moment we can't reproduce that, but I don't think it would have been fixed with the above PR. I was keeping this open in case it comes back again. We could close it soon if we can't reproduce. Future work items to add test cases to build against more compilers should provide confidence that it's actually gone.