GEOS-DEV / GEOS

GEOS Simulation Framework
GNU Lesser General Public License v2.1
222 stars 89 forks source link

[Bug] TPL: invalid C++ (std::numeric_limits); CMake problems #1531

Closed michaelkarlcoleman closed 2 years ago

michaelkarlcoleman commented 3 years ago

Describe the bug While compiling TPL according to the Quick Start guide, it errors on source file thirdPartyLibs/build-try2-release/chai/src/chai/src/tpl/umpire/src/umpire/util/allocation_statistics.cpp.

Looking within, the code is trying to use std::numeric_limits without including the <limits> header. I'm using g++ version 11.1.0, but this code is simply invalid wrt the C++ specs.

More info here: https://www.gnu.org/software/gcc/gcc-11/porting_to.html#header-dep-changes

More generally, I've been struggling for days to find a combination of versions and configs that might compile GEOSX (outside of the national labs). If anyone has ever seen this compile on a vanilla Linux distro, it would be very useful to provide a list of dependencies and versions thought to work. Even better, a scripted build from start to tested, perhaps in the form of a Docker or Singularity recipe.

My final goal is to get it compiled with MPI, CUDA, and perhaps OpenMP. So far, I'm still batting zero. (The uberenv route looked promising, but unfortunately fails with a link error.)

rrsettgast commented 3 years ago

Hi Michael, What you are trying to do shouldn't be too much of a problem. We can check if the error has been corrected in the tpl, or apply a patch.

@corbett5 do you have gcc11 setup in your spack builds?

michaelkarlcoleman commented 3 years ago

The GCC is indeed coming from Spack. Here's what's loaded:

$ spack find --loaded
==> 45 installed packages
-- linux-rhel7-broadwell / gcc@8.2.0 ----------------------------
berkeley-db@18.1.40  gettext@0.21          libiconv@1.16        mpfr@4.1.0      readline@8.1
bzip2@1.0.8          git@2.31.1            libidn2@2.3.0        ncurses@6.2     tar@1.34
curl@7.76.1          git-lfs@2.11.0        libmd@1.0.3          openssh@8.5p1   xz@5.2.5
expat@2.3.0          gmp@6.2.1             libunistring@0.9.10  openssl@1.1.1k  zlib@1.2.11
gcc@11.1.0           libbsd@0.11.3         libxml2@2.9.10       pcre2@10.36     zstd@1.5.0
gdbm@1.19            libedit@3.1-20210216  mpc@1.1.0            perl@5.32.1

-- linux-rhel7-broadwell / gcc@11.1.0 ---------------------------
cmake@3.20.5          libevent@2.1.12    ncurses@6.2      openssh@8.5p1
cuda@11.2.2           libiconv@1.16      numactl@2.0.14   openssl@1.1.1k
hwloc@2.5.0           libpciaccess@0.16  openblas@0.3.15  xz@5.2.5
libedit@3.1-20210216  libxml2@2.9.10     openmpi@4.1.1    zlib@1.2.11

and here's the host config I'm using:

# detect host and name the configuration file
site_name(HOST_NAME)
set(CONFIG_NAME "your-platform" CACHE PATH "")
message( "CONFIG_NAME = ${CONFIG_NAME}" )

# set paths to C, C++, and Fortran compilers. Note that while GEOSX does not contain any Fortran code,
# some of the third-party libraries do contain Fortran code. Thus a Fortran compiler must be specified.
set(CMAKE_C_COMPILER "/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/gcc-11.1.0-b5ctlwh3lnwoystx6r6yet3ue4nikxcg/bin/gcc" CACHE PATH "")
set(CMAKE_CXX_COMPILER "/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/gcc-11.1.0-b5ctlwh3lnwoystx6r6yet3ue4nikxcg/bin/g++" CACHE PATH "")
set(CMAKE_Fortran_COMPILER "/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/gcc-11.1.0-b5ctlwh3lnwoystx6r6yet3ue4nikxcg/bin/gfortran" CACHE PATH "")
set(ENABLE_FORTRAN OFF CACHE BOOL "" FORCE)

# enable MPI and set paths to compilers and executable.
# Note that the MPI compilers are wrappers around standard serial compilers.
# Therefore, the MPI compilers must wrap the appropriate serial compilers specified
# in CMAKE_C_COMPILER, CMAKE_CXX_COMPILER, and CMAKE_Fortran_COMPILER.
set(ENABLE_MPI ON CACHE PATH "")
set(MPI_C_COMPILER "/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-11.1.0/openmpi-4.1.1-dvuecndxwkrbbi5p2rtkhfnakf7ydgnm/bin/mpicc" CACHE PATH "")
set(MPI_CXX_COMPILER "/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-11.1.0/openmpi-4.1.1-dvuecndxwkrbbi5p2rtkhfnakf7ydgnm/bin/mpic++" CACHE PATH "")
set(MPI_Fortran_COMPILER "/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-11.1.0/openmpi-4.1.1-dvuecndxwkrbbi5p2rtkhfnakf7ydgnm/bin/mpifort" CACHE PATH "")
set(MPIEXEC "/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-11.1.0/openmpi-4.1.1-dvuecndxwkrbbi5p2rtkhfnakf7ydgnm/bin/mpirun" CACHE PATH "")

# disable CUDA and OpenMP
set(CUDA_ENABLED "OFF" CACHE PATH "" FORCE)
set(ENABLE_OPENMP "OFF" CACHE PATH "" FORCE)

# enable PAMELA and PVTPackage
set(ENABLE_PAMELA ON CACHE BOOL "" FORCE)
set(ENABLE_PVTPackage ON CACHE BOOL "" FORCE)

# enable tests
set(ENABLE_GTEST_DEATH_TESTS ON CACHE BOOL "" FORCE )

set(ENABLE_DOCS OFF CACHE BOOL "" FORCE )

It's not entirely obvious that this approach will work, but it seems possible. At the moment, I'm just hunting for "proof of life"--any recipe whatsoever to get a working compile from source. That would be a base from which to build, for options, optimizations, etc.

The code apparently compiles at one or more of the national labs, but I have no idea what those environments look like, and I suspect they are rather exotic.

I can patch the bug reported here locally, but the fact that I'm hitting it at all suggests that I'm "doing it wrong" (compiling in a way that no one has ever tried). I'm looking for a recipe that someone has seen work, and that I can reasonably reproduce.

klevzoff commented 3 years ago

Hi @michaelkarlcoleman , I'm building and developing GEOSX locally on vanilla Ubuntu 20.04 using gcc 8, 9 and 10, as well as clang 8, all installed via system package manager (apt). The only caveat was I had to build OpenMPI from source for each of these compilers so as to not rely on ABI compatibility (default system open-mpi package also works fine and can be tricked into using non-default compiler via OMPI_CXX, etc. env vars). The problem seems to be with gcc-11 only, and we should definitely report and fix this in upstream CHAI/Umpire ~(it doesn't seem to be fixed in their development trunk yet)~ Edit: it is fixed in Umpire trunk, but CHAI has not updated their submodule to that version yet.

corbett5 commented 3 years ago

@rrsettgast I don't think I have gcc 11 installed on LC, but it would be easy enough to add.

@michaelkarlcoleman we have 9 prebuilt docker images, I think they just contain the TPL's though. Here's a link https://hub.docker.com/repository/docker/geosx/ubuntu18.04-clang8.0.0-cuda10.1.243, I think they're public and if not they should be.

michaelkarlcoleman commented 3 years ago

@corbett5 I did spend quite a bit of time looking at the Docker images, but it doesn't appear that they help in our case. As you say, they seem to contain only the TPLs. Exasperating, as it seems like it would be easy enough to include a basic working version of GEOSX. But still, it's not clear how MPI users could benefit from GEOSX on Docker images. Copying the compiled contents out seems fraught with problems, and trying use the containers as is has its own set of difficulties when running on more than one host. My impression is that the Docker images are mostly only useful for developers and conceivably someone who wants to use GEOSX on a toy scale.

michaelkarlcoleman commented 3 years ago

@klevzoff Thanks. Based on this, I switched to GCC 8.2.0 (chosen arbitrarily, since I already had the parts built in Spack). This gets me past the problem above, but unfortunately crashes into trouble when trying to build doxygen. The issue seems to be that it can't find libiconv, even though it's there. Even trying to point cmake directly at it using LIBRARY_PATH or LD_LIBRARY_PATH fails.

As an alternative route, I tried defeating the doxygen build in various ways (since I neither need nor want it), but so far without success. This block in my host config seems not to have any effect:

set(ENABLE_DOCS OFF CACHE BOOL "" FORCE )

set(ENABLE_BENCHMARKS OFF CACHE BOOL "" FORCE)
set(ENABLE_DOXYGEN OFF CACHE BOOL "" FORCE)
set(ENABLE_MATHPRESSO OFF CACHE BOOL "" FORCE)
set(ENABLE_SPHINX_EXECUTABLE OFF CACHE BOOL "" FORCE)
set(ENABLE_UNCRUSTIFY OFF CACHE BOOL "" FORCE)
set(ENABLE_XML_UPDATES OFF CACHE BOOL "" FORCE)

Similarly, commenting out the lines in tpls.cmake has no apparent effect. Still tries to build doxygen. It's like kudzu.

I'm thinking about giving up on Spack and trying to piece together something with the environment modules we have as a next approach. Not very optimistic, though.

michaelkarlcoleman commented 3 years ago

Hi @klevzoff , one more thing. Are your build recipes available somewhere? Hopefully in the form of a script or set of scripts that build everything starting from (say) a newly minted, minimal Ubuntu image? Thanks.

michaelkarlcoleman commented 3 years ago

For the record, looking through the TPL cmake files, it appeared that perhaps

set(GEOSXTPL_ENABLE_DOXYGEN OFF CACHE BOOL "" FORCE)

would help. It may actually have disabled the doxygen build. Unfortunately, the build still fails later, like so:

-- Installing: /gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/thirdPartyLibs/install-talapas-spack-release/axom/examples/axom/using-with-make/Makefile
-- Installing: /gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/thirdPartyLibs/install-talapas-spack-release/axom/examples/axom/using-with-make/example.cpp
[ 93%] Completed 'axom'
[ 93%] Built target axom
[ 94%] Building CXX object CMakeFiles/tpl.dir/tpl.cpp.o
[ 95%] Linking CXX executable bin/tpl
[ 95%] Built target tpl
[ 95%] Building CXX object blt/thirdparty_builtin/googletest-master-2020-01-07/googletest/CMakeFiles/gtest.dir/src/gtest-all.cc.o
[ 96%] Linking CXX static library ../../../../lib/libgtest.a
[ 96%] Built target gtest
[ 96%] Building CXX object blt/thirdparty_builtin/googletest-master-2020-01-07/googletest/CMakeFiles/gtest_main.dir/src/gtest_main.cc.o
[ 97%] Linking CXX static library ../../../../lib/libgtest_main.a
[ 97%] Built target gtest_main
[ 97%] Building CXX object blt/tests/smoke/CMakeFiles/blt_mpi_smoke.dir/blt_mpi_smoke.cpp.o
[ 98%] Linking CXX executable ../../../tests/blt_mpi_smoke
/gpfs/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/libevent-2.1.12-esh72id7jta52ghvl6egurb4bwgp5gre/lib: file not recognized: Is a directory
collect2: error: ld returned 1 exit status
make[2]: *** [tests/blt_mpi_smoke] Error 1
make[1]: *** [blt/tests/smoke/CMakeFiles/blt_mpi_smoke.dir/all] Error 2
make: *** [all] Error 2

There's no log, but this appears to be due to

# in thirdPartyLibs/build-talapas-spack-release/blt/tests/smoke/CMakeFiles/blt_mpi_smoke.dir
$ cat link.txt
/packages/spack/spack/opt/spack/linux-rhel7-x86_64/gcc-7.3.0/gcc-8.2.0-kot2sql3i2pckkfopvmxdmbdopuwy42t/bin/g++ -Wall -Wextra  -O3 -DNDEBUG -Wl,-rpath,/gpfs/packages/spack/spack/opt/spack/linux-rhel7-x86_64/gcc-7.3.0/gcc-8.2.0-kot2sql3i2pckkfopvmxdmbdopuwy42t/lib/gcc/x86_64-pc-linux-gnu/8.2.0 -Wl,-rpath,/gpfs/packages/spack/spack/opt/spack/linux-rhel7-x86_64/gcc-7.3.0/gcc-8.2.0-kot2sql3i2pckkfopvmxdmbdopuwy42t/lib64 -Wl,-rpath -Wl,/gpfs/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/hwloc-2.4.1-hkbkx5nnzw3ubtzxcpirhv4uzkexso52/lib -Wl,/gpfs/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/libevent-2.1.12-esh72id7jta52ghvl6egurb4bwgp5gre/lib -Wl,/gpfs/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/zlib-1.2.11-r2llgrncecwh3hlaqtn7e6x7nwdzap3m/lib -Wl,/gpfs/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/openmpi-4.0.5-efzld6vninonhliblgn2ci52aqnroile/lib -L/gpfs/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/hwloc-2.4.1-hkbkx5nnzw3ubtzxcpirhv4uzkexso52/lib -L/gpfs/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/libevent-2.1.12-esh72id7jta52ghvl6egurb4bwgp5gre/lib -L/gpfs/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/zlib-1.2.11-r2llgrncecwh3hlaqtn7e6x7nwdzap3m/lib -pthread CMakeFiles/blt_mpi_smoke.dir/blt_mpi_smoke.cpp.o -o ../../../tests/blt_mpi_smoke  -Wl,-rpath,/gpfs/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/openmpi-4.0.5-efzld6vninonhliblgn2ci52aqnroile/lib /gpfs/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/openmpi-4.0.5-efzld6vninonhliblgn2ci52aqnroile/lib/libmpi.so

which, as you can clearly see :-) is bogus. The key bit is the -Wl,-rpath -Wl,/gpfs/..., which looks like something somewhere mangled one of these directories into an empty string (or inserted an empty string).

I believe I saw this also during an early attempt to use the uberenv approach. My guess is that this is an issue with a CMake file somewhere, but after running through the strace logs for quite a while during that early attempt, I never was able to spot what was going on.

corbett5 commented 3 years ago

What version of CMake are you using? Where did you get to when using uberenv? If you're starting from a minimal Ubuntu distribution that would be the easiest way forward, especially if you don't mind letting Spack build the world.

michaelkarlcoleman commented 3 years ago

For the above, the CMake version was 3.20.5. (see my first comment)

I didn't take very good notes for my first uberenv try, but it seemed to get a fair way in. Retrying it just now, with a very vanilla environment (no modules loaded, distro is RHEL 7.9), it blows up almost immediately because the system gcc is 4.8.5, and it tries to build suite-sparse first, which requires a later gcc.

[exe: spack/bin/spack dev-build --quiet -d /gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/scripts/uberenv/../.. -u hostconfig geosx@develop%gcc]
==> Warning: gcc@4.8.5 cannot build optimized binaries for "broadwell". Using best target possible: "haswell"
==> Error: Conflicts in concretized spec "geosx@develop%gcc@4.8.5+caliper~cuda~essl+hypre~hypre-cuda~mkl+petsc~pygeosx+shared+suite-sparse+trilinos build_type=RelWithDebInfo cuda_arch=none lai=trilinos arch=linux-rhel7-haswell/tihmllq"

List of matching conflicts for spec:

    suite-sparse@5.8.1%gcc@4.8.5+amd~blas-no-underscore~btf+camd+ccolamd+cholmod+colamd~csparse~cuda~cxsparse~klu+openmp+pic~rbio~spqr~tbb+umfpack arch=linux-rhel7-haswell
        ^cmake@3.18.2%gcc@4.8.5~doc+ncurses+openssl+ownlibs~qt arch=linux-rhel7-haswell
            ^ncurses@6.2%gcc@4.8.5~symlinks+termlib arch=linux-rhel7-haswell
                ^pkgconf@1.7.3%gcc@4.8.5 arch=linux-rhel7-haswell
            ^openssl@1.1.1g%gcc@4.8.5+systemcerts arch=linux-rhel7-haswell
                ^perl@5.30.3%gcc@4.8.5+cpanm+shared+threads arch=linux-rhel7-haswell
                    ^berkeley-db@18.1.40%gcc@4.8.5 arch=linux-rhel7-haswell
                    ^gdbm@1.18.1%gcc@4.8.5 arch=linux-rhel7-haswell
                        ^readline@8.0%gcc@4.8.5 arch=linux-rhel7-haswell
                ^zlib@1.2.11%gcc@4.8.5+optimize+pic+shared arch=linux-rhel7-haswell
        ^m4@1.4.18%gcc@4.8.5+sigsegv patches=3877ab548f88597ab2327a2230ee048d2d07ace1062efe81fc92e91b7f39cd00,fc9b61654a3ba1a8d6cd78ce087e7c96366c290bc8d2c299f09828d793b853c8 arch=linux-rhel7-haswell
            ^libsigsegv@2.12%gcc@4.8.5 arch=linux-rhel7-haswell
        ^metis@5.1.0%gcc@4.8.5~gdb+int64~real64+shared build_type=Release patches=4991da938c1d3a1d3dea78e49bbebecba00273f98df2a656e38b83d55b281da1 arch=linux-rhel7-haswell
        ^openblas@0.3.10%gcc@4.8.5~consistent_fpcsr~ilp64+pic+shared threads=none arch=linux-rhel7-haswell

1. "%gcc@:4.8" conflicts with "suite-sparse@5.2.0:" [gcc version must be at least 4.9 for suite-sparse@5.2.0:]

[ERROR: failure of spack install/dev-build]

One way to fix this would be to build the compilers first in Spack, as part of building the world.

As another experiment, I used a GCC 10.2.0. This is actually from within our existing Spack tree, but without initializing Spack itself (to avoid any confusion with the uberenv use of Spack). I think this is valid, though not absolutely certain.

One notable point is that a recent 'git' is required, but it seems is not built by uberenv and then used. So it has to exist outside of the uberenv tree, just like the compiler(s). This is a bit harder to spot, as an older 'git' does seem to "work" without error status, but it's not clear whether it's doing the right thing or not. Seeing a warning (?) like Fetching tags only, you probably meant: git fetch --tags.

export CC=/gpfs/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/gcc-10.2.0-55xl7vwtoqeyu3gnbkhit5m3qnahf4f4/bin/gcc
export CXX=/gpfs/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/gcc-10.2.0-55xl7vwtoqeyu3gnbkhit5m3qnahf4f4/bin/g++
export F77=/gpfs/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/gcc-10.2.0-55xl7vwtoqeyu3gnbkhit5m3qnahf4f4/bin/gfortran
export FC=/gpfs/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/gcc-10.2.0-55xl7vwtoqeyu3gnbkhit5m3qnahf4f4/bin/gfortran
export LD_LIBRARY_PATH=/gpfs/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/gcc-10.2.0-55xl7vwtoqeyu3gnbkhit5m3qnahf4f4/lib64:/gpfs/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/gcc-10.2.0-55xl7vwtoqeyu3gnbkhit5m3qnahf4f4/lib:$LD_LIBRARY_PATH
export PATH=/gpfs/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/gcc-10.2.0-55xl7vwtoqeyu3gnbkhit5m3qnahf4f4/bin:$PATH

export PATH=/gpfs/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/git-2.31.1-m7r7sioduzbfg33hc6wlbvyoiktl3x7i/bin:$PATH

Beyond that, I did this

sed -i -e 's|git@github.com:corbett5/spack.git|https://github.com/corbett5/spack.git|'  scripts/uberenv/project.json
scripts/uberenv/uberenv.py

This gets a long way in, but ultimately fails with a link error similar to the one mentioned above:

==> Installing conduit
==> No binary for conduit found: installing from source
==> Fetching https://spack-llnl-mirror.s3-us-west-2.amazonaws.com/_source-cache/archive/7e/7efac668763d02bd0a2c0c1b134d9f5ee27e99008183905bb0512e5502b8b4fe.tar.gz
######################################################################## 100.0%
==> conduit: Executing phase: 'configure'
==> conduit: Executing phase: 'build'
==> Error: ProcessError: Command exited with status 2:
    'make' '-j28'

7 errors found in build log:
     816      352 |     DataType(const DataType& type);
     817          |     ^~~~~~~~
     818    [ 65%] Linking CXX shared library ../../lib/libconduit_relay_mpi.so
     819    cd /gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/uberenv_libs/builds/spack-stage-conduit-0.5.0-eetp764l2x3hhs2ce3g7jf65nxxvymyh/spack-src/spack-build/li
            bs/relay && /gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/uberenv_libs/linux-rhel7-broadwell/gcc-10.2.0/cmake-3.18.2-glaktlnxnd6nnqvv7fsfgg7nfrvuqbxq/bi
            n/cmake -E cmake_link_script CMakeFiles/conduit_relay_mpi.dir/link.txt --verbose=1
     820    /gpfs/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/gcc-10.2.0-55xl7vwtoqeyu3gnbkhit5m3qnahf4f4/bin/g++ -fPIC -O2 -g -DNDEBUG -Wl,-rpath,/gpfs/
            packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/gcc-10.2.0-55xl7vwtoqeyu3gnbkhit5m3qnahf4f4/lib/gcc/x86_64-pc-linux-gnu/10.2.0 -Wl,-rpath,/gpfs/p
            ackages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/gcc-10.2.0-55xl7vwtoqeyu3gnbkhit5m3qnahf4f4/lib64 -Wl,-rpath -Wl,/gpfs/projects/hpcrcf/mcolema5/ak
            ubo-geosx/try2/GEOSX/uberenv_libs/linux-rhel7-broadwell/gcc-10.2.0/hwloc-1.11.11-ijt37r6wh2swaj4is5zbigao7b6n2yzs/lib -Wl,/gpfs/projects/hpcrcf/mcolema5/akubo-g
            eosx/try2/GEOSX/uberenv_libs/linux-rhel7-broadwell/gcc-10.2.0/zlib-1.2.11-t37xblst5onbiz2tn6d4hkrg2a5wic2o -Wl,/gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/G
            EOSX/uberenv_libs/linux-rhel7-broadwell/gcc-10.2.0/openmpi-3.1.6-e5b3hxgf5p3a2dgu63rwctcphurbcqpv/lib -L/gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/ub
            erenv_libs/linux-rhel7-broadwell/gcc-10.2.0/hwloc-1.11.11-ijt37r6wh2swaj4is5zbigao7b6n2yzs/lib -L/gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/uberenv_l
            ibs/linux-rhel7-broadwell/gcc-10.2.0/zlib-1.2.11-t37xblst5onbiz2tn6d4hkrg2a5wic2o -pthread -shared -Wl,-soname,libconduit_relay_mpi.so -o ../../lib/libconduit_r
            elay_mpi.so CMakeFiles/conduit_relay_mpi.dir/conduit_relay_mpi.cpp.o  -Wl,-rpath,/gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/uberenv_libs/builds/spack
            -stage-conduit-0.5.0-eetp764l2x3hhs2ce3g7jf65nxxvymyh/spack-src/spack-build/lib:/gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/uberenv_libs/linux-rhel7-b
            roadwell/gcc-10.2.0/openmpi-3.1.6-e5b3hxgf5p3a2dgu63rwctcphurbcqpv/lib: ../../lib/libconduit.so /gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/uberenv_li
            bs/linux-rhel7-broadwell/gcc-10.2.0/openmpi-3.1.6-e5b3hxgf5p3a2dgu63rwctcphurbcqpv/lib/libmpi.so
     821    /gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/uberenv_libs/linux-rhel7-broadwell/gcc-10.2.0/zlib-1.2.11-t37xblst5onbiz2tn6d4hkrg2a5wic2o: file not recog
            nized: Is a directory
  >> 822    collect2: error: ld returned 1 exit status
  >> 823    make[2]: *** [lib/libconduit_relay_mpi.so] Error 1
     824    make[2]: Leaving directory `/gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/uberenv_libs/builds/spack-stage-conduit-0.5.0-eetp764l2x3hhs2ce3g7jf65nxxvymyh
            /spack-src/spack-build'
  >> 825    make[1]: *** [libs/relay/CMakeFiles/conduit_relay_mpi.dir/all] Error 2

Again, we see mangled arguments as the proximate cause: -Wl,-rpath -Wl,/gpfs/

And again, my best guess would be that this is mangled string manipulation somewhere, probably in a CMake config file of some sort.

If that can't be diagnosed, I suppose another thing to try would be to start with a container having either RHEL 7.9 or the CentOS equivalent and uberenv from there. If it succeeds, perhaps all of that would be close enough (in terms of things like libc) to be copied out and run on our hosts. Maybe. If it only builds on Ubuntu, this could still work, though experience suggests a libc clash is more likely in that case.

corbett5 commented 3 years ago

GEOSX doesn't support GCC 4 either, so you'll need something newer. Seeing as the Umpire we're currently using doesn't support GCC 11, I'd say anything between 8 and 10.

As another experiment, I used a GCC 10.2.0. This is actually from within our existing Spack tree, but without initializing Spack itself (to avoid any confusion with the uberenv use of Spack). I think this is valid, though not absolutely certain.

Yes if you put the absolute path to GCC in the compilers.yaml file then it doesn't matter where it comes from.

I don't remember ever seeing that specific error you're getting, and it's weird that you get it both in the uberenv build and the plain CMake build. With different versions of CMake as well.

michaelkarlcoleman commented 3 years ago

It would be handy to document these version requirements somewhere obvious. Perhaps even better would be to have the build scripts detect this up front with a diagnostic. The ultimate would be to have 'uberenv' simply build the compilers it needs, though I imagine this feature might be less useful in your environment.

It does seem that this is not a Spack issue, since (I think) we see it both with and without. I'm inclined to blame CMake (probably a config file), but that's partly because I loathe CMake.

I'm trying to figure out how I might enable voluminous tracing/debugging output for the cmake runs. If someone is familiar with an easy way to do that, I'd love to hear it.

The proximal CMake bogus output seems to be CMakeFiles/conduit_relay_mpi_io.dir/link.txt, relative to directory GEOSX/uberenv_libs/builds/spack-stage-conduit-0.5.0-eetp764l2x3hhs2ce3g7jf65nxxvymyh/spack-src/spack-build/libs/relay. Is there a straightforward way to see all of the inputs that go into that file?

michaelkarlcoleman commented 3 years ago

I'm giving up on this, unless someone else can spot a way forward. It's just soaking up too many hours.

To aid anyone who might later push on, I'll leave some breadcrumbs that might be useful.

To clarify the above-mentioned mangling of the ld command line, here's one specific example that exhibits a number of problems. The problems are

  1. In some cases, an -rpath flag is followed by multiple directories that are clearly intended to have -rpath in front of each. As soon as ld encounters a directory missing its -rpath flag, it assumes that that is a file to be loaded and fails immediately, since it's a directory.
  2. Some -rpath arguments contain colons. This does not match the ld documentation, but sort of accidentally works, since ld is itself internally accumulating a colon-delimited path. One limitation is that ld uniquifies this list if it's given properly, but this won't work if the colons appear in the arguments. This is probably just a minor performance issue, but might also be a problem for correctness.
  3. Similarly, some arguments end in colons. Not sure how the whole toolchain would react to such an "empty" directory in the path.
  4. In some cases, the -rpath argument is simply wrong. In particular, the final "/lib" or "/lib64" is being clipped off. That won't work, obviously. (This happens with -L as well.) This might sometimes "work" by then finding the library in question in one of the standard locations.

Broadly, this appears to be string mangling happening either due to issues in CMake itself, or the CMake config files in this project or its dependencies.

The compilation with uberenv.py works fine for the "build the world" part, but does eventually fail for several packages that (I think) are from LLNL.

Although uberenv builds openmpi, I saw an error indicating that that version of mpicc and friends might not actually be being seen by the rest of the build. I'm not sure about this. It certainly would be handy if uberenv would build and then use all of the compilers it needs.

Here's one example of the above. (The ld-hack/ld is just a script to capture the command lines.)

['/projects/hpcrcf/mcolema5/akubo-geosx/ld-hack/ld'
'-L/gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/uberenv_libs/linux-rhel7-broadwell/gcc-10.2.0/hwloc-1.11.11-ijt37r6wh2swaj4is5zbigao7b6n2yzs/lib'
'-L/gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/uberenv_libs/linux-rhel7-broadwell/gcc-10.2.0/zlib-1.2.11-t37xblst5onbiz2tn6d4hkrg2a5wic2o'
'-L/gpfs/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/gcc-10.2.0-55xl7vwtoqeyu3gnbkhit5m3qnahf4f4/lib/gcc/x86_64-pc-linux-gnu/10.2.0'
'-L/gpfs/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/gcc-10.2.0-55xl7vwtoqeyu3gnbkhit5m3qnahf4f4/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../lib64'
'-L/lib/../lib64'
'-L/usr/lib/../lib64'
'-L/gpfs/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/gcc-10.2.0-55xl7vwtoqeyu3gnbkhit5m3qnahf4f4/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../..'
'-L/gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/uberenv_libs/linux-rhel7-broadwell/gcc-10.2.0/hdf5-1.10.7-i7grfg57xn6rudck5j2og3r365z2zhg7/lib'
'-L/gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/uberenv_libs/linux-rhel7-broadwell/gcc-10.2.0/hwloc-1.11.11-ijt37r6wh2swaj4is5zbigao7b6n2yzs/lib'
'-L/gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/uberenv_libs/linux-rhel7-broadwell/gcc-10.2.0/openmpi-3.1.6-e5b3hxgf5p3a2dgu63rwctcphurbcqpv/lib'
'-L/gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/uberenv_libs/linux-rhel7-broadwell/gcc-10.2.0/libiconv-1.16-7l4c7ej4wezb6ntj6b6lwkeu2j5ie5dc/lib'
'-L/gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/uberenv_libs/linux-rhel7-broadwell/gcc-10.2.0/xz-5.2.5-yuqslbejlz5uvtdx2h5mkemwyasyfpm2/lib'
'-L/gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/uberenv_libs/linux-rhel7-broadwell/gcc-10.2.0/libxml2-2.9.10-i5d5u6jieh7xeoxupduyjnbk2apgk75y/lib'
'-L/gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/uberenv_libs/linux-rhel7-broadwell/gcc-10.2.0/libpciaccess-0.16-7qzhwni6mqgbmuj3bmib4hlu6k2r4aie/lib'
'-L/gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/uberenv_libs/linux-rhel7-broadwell/gcc-10.2.0/zlib-1.2.11-t37xblst5onbiz2tn6d4hkrg2a5wic2o/lib'
'-L/gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/uberenv_libs/linux-rhel7-broadwell/gcc-10.2.0/numactl-2.0.12-hn3dumkpdwo4pk6wzuhdz5hlwmm6gz2b/lib'
'--enable-new-dtags'
'-rpath'
'/gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/uberenv_libs/linux-rhel7-broadwell/gcc-10.2.0/conduit-0.5.0-eetp764l2x3hhs2ce3g7jf65nxxvymyh/lib'
'-rpath'
'/gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/uberenv_libs/linux-rhel7-broadwell/gcc-10.2.0/conduit-0.5.0-eetp764l2x3hhs2ce3g7jf65nxxvymyh/lib64'
'-rpath'
'/gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/uberenv_libs/linux-rhel7-broadwell/gcc-10.2.0/hdf5-1.10.7-i7grfg57xn6rudck5j2og3r365z2zhg7/lib'
'-rpath'
'/gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/uberenv_libs/linux-rhel7-broadwell/gcc-10.2.0/hwloc-1.11.11-ijt37r6wh2swaj4is5zbigao7b6n2yzs/lib'
'-rpath'
'/gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/uberenv_libs/linux-rhel7-broadwell/gcc-10.2.0/openmpi-3.1.6-e5b3hxgf5p3a2dgu63rwctcphurbcqpv/lib'
'-rpath'
'/gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/uberenv_libs/linux-rhel7-broadwell/gcc-10.2.0/libiconv-1.16-7l4c7ej4wezb6ntj6b6lwkeu2j5ie5dc/lib'
'-rpath'
'/gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/uberenv_libs/linux-rhel7-broadwell/gcc-10.2.0/xz-5.2.5-yuqslbejlz5uvtdx2h5mkemwyasyfpm2/lib'
'-rpath'
'/gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/uberenv_libs/linux-rhel7-broadwell/gcc-10.2.0/libxml2-2.9.10-i5d5u6jieh7xeoxupduyjnbk2apgk75y/lib'
'-rpath'
'/gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/uberenv_libs/linux-rhel7-broadwell/gcc-10.2.0/libpciaccess-0.16-7qzhwni6mqgbmuj3bmib4hlu6k2r4aie/lib'
'-rpath'
'/gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/uberenv_libs/linux-rhel7-broadwell/gcc-10.2.0/zlib-1.2.11-t37xblst5onbiz2tn6d4hkrg2a5wic2o/lib'
'-rpath'
'/gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/uberenv_libs/linux-rhel7-broadwell/gcc-10.2.0/numactl-2.0.12-hn3dumkpdwo4pk6wzuhdz5hlwmm6gz2b/lib'
'-rpath'
'/gpfs/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/gcc-10.2.0-55xl7vwtoqeyu3gnbkhit5m3qnahf4f4/lib/gcc/x86_64-pc-linux-gnu/10.2.0'
'-rpath'
'/gpfs/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/gcc-10.2.0-55xl7vwtoqeyu3gnbkhit5m3qnahf4f4/lib64'
'-plugin'
'/gpfs/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/gcc-10.2.0-55xl7vwtoqeyu3gnbkhit5m3qnahf4f4/libexec/gcc/x86_64-pc-linux-gnu/10.2.0/liblto_plugin.so'
'-plugin-opt=/gpfs/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/gcc-10.2.0-55xl7vwtoqeyu3gnbkhit5m3qnahf4f4/libexec/gcc/x86_64-pc-linux-gnu/10.2.0/lto-wrapper'
'-plugin-opt=-fresolution=/tmp/ccS6undX.res'
'-plugin-opt=-pass-through=-lgcc_s'
'-plugin-opt=-pass-through=-lpthread'
'-plugin-opt=-pass-through=-lc'
'-plugin-opt=-pass-through=-lgcc_s'
'-rpath'
'/gpfs/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/gcc-10.2.0-55xl7vwtoqeyu3gnbkhit5m3qnahf4f4/lib:/gpfs/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/gcc-10.2.0-55xl7vwtoqeyu3gnbkhit5m3qnahf4f4/lib64'
'--eh-frame-hdr'
'-m'
'elf_x86_64'
'-shared'
'-o'
'../../lib/libconduit_relay_mpi.so'
'/lib/../lib64/crti.o'
'/gpfs/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/gcc-10.2.0-55xl7vwtoqeyu3gnbkhit5m3qnahf4f4/lib/gcc/x86_64-pc-linux-gnu/10.2.0/crtbeginS.o'
'-rpath'
'/gpfs/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/gcc-10.2.0-55xl7vwtoqeyu3gnbkhit5m3qnahf4f4/lib/gcc/x86_64-pc-linux-gnu/10.2.0'
'-rpath'
'/gpfs/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/gcc-10.2.0-55xl7vwtoqeyu3gnbkhit5m3qnahf4f4/lib64'
'-rpath'
'/gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/uberenv_libs/linux-rhel7-broadwell/gcc-10.2.0/hwloc-1.11.11-ijt37r6wh2swaj4is5zbigao7b6n2yzs/lib'
'/gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/uberenv_libs/linux-rhel7-broadwell/gcc-10.2.0/zlib-1.2.11-t37xblst5onbiz2tn6d4hkrg2a5wic2o'
'/gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/uberenv_libs/linux-rhel7-broadwell/gcc-10.2.0/openmpi-3.1.6-e5b3hxgf5p3a2dgu63rwctcphurbcqpv/lib'
'-soname'
'libconduit_relay_mpi.so'
'CMakeFiles/conduit_relay_mpi.dir/conduit_relay_mpi.cpp.o'
'-rpath'
'/gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/uberenv_libs/builds/spack-stage-conduit-0.5.0-eetp764l2x3hhs2ce3g7jf65nxxvymyh/spack-src/spack-build/lib:/gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/uberenv_libs/linux-rhel7-broadwell/gcc-10.2.0/openmpi-3.1.6-e5b3hxgf5p3a2dgu63rwctcphurbcqpv/lib:'
'../../lib/libconduit.so'
'/gpfs/projects/hpcrcf/mcolema5/akubo-geosx/try2/GEOSX/uberenv_libs/linux-rhel7-broadwell/gcc-10.2.0/openmpi-3.1.6-e5b3hxgf5p3a2dgu63rwctcphurbcqpv/lib/libmpi.so'
'-lstdc++'
'-lm'
'-lgcc_s'
'-lpthread'
'-lc'
'-lgcc_s'
'/gpfs/packages/spack/spack/opt/spack/linux-rhel7-broadwell/gcc-8.2.0/gcc-10.2.0-55xl7vwtoqeyu3gnbkhit5m3qnahf4f4/lib/gcc/x86_64-pc-linux-gnu/10.2.0/crtendS.o'
'/lib/../lib64/crtn.o']

Finally, a few links that might be relevant, or at least hint at similar issues.

https://github.com/LLNL/blt/issues/266

https://github.com/LLNL/blt/issues/363

https://gitlab.kitware.com/cmake/cmake/-/issues/17025