geodynamics / aspect

A parallel, extensible finite element code to simulate convection in both 2D and 3D models.
https://aspect.geodynamics.org/
Other
227 stars 237 forks source link

Installation using precompiled Trilinos #6109

Closed Yidali26 closed 1 week ago

Yidali26 commented 4 weeks ago

Hello, I'm a new ASPECT user, and I've been trying to install ASPECT on a French supercomputer JOLIOT-CURIE. This computer has quite strict usage restriction, like no internet access and git are allowed, so I have to follow the guidance of Installation with no internet access. Everything works fine except that there are some difficulties to install Trilinos, but I noticed there is an available Trilinos module on the supercomputer. I modified the local.cfg file after the instruction on Compiling a static ASPECT executable and modify the local.cfg file accordingly. Below is my loaded module:

$ module list
Currently Loaded Modulefiles:
 1) ccc/1.0                           18) cuda/11.6                       
 2) datadir/unipsud                   19) flavor/libccc_user/hwloc2       
 3) datadir/own                       20) hwloc/2.5.0                     
 4) dfldatadir/own                    21) pmix/4.2.2                      
 5) flavor/buildtarget/x86_64         22) feature/hcoll/multicast/enable  
 6) flavor/buildcompiler/gcc/8        23) sharp/2.4.2                     
 7) c++/gnu/8.3.0(default:gcc)        24) hcoll/4.7.3191                  
 8) c/gnu/8.3.0(default:gcc)          25) ucx/1.12.1                      
 9) fortran/gnu/8.3.0(default:gcc)    26) .tuning/openmpi/4.1.4           
10) gnu/8.3.0(default:gcc)            27) mpi/openmpi/4.1.4(default:mpi)  
11) flavor/buildmpi/openmpi/4         28) lapack/netlib/3.9.0             
12) feature/openmpi/mpi_compiler/gcc  29) flavor/hdf5/serial              
13) feature/mkl/single_node           30) hdf5/1.12.0                     
14) feature/openmpi/io/standard       31) netcdf-c/4.7.4                  
15) feature/openmpi/net/auto          32) trilinos/14.4.0                 
16) flavor/ucx/standard               33) cmake/3.26.4                    
17) flavor/cuda/nvhpc-222             

Key:
(symbolic-version)  auto-loaded  default-version  

And below is my local.cfg file:

PACKAGES="load:dealii-prepare once:cmake once:astyle once:hdf5 once:netcdf once:sundials once:p4est once:dealii"

DEAL_II_VERSION=v9.5.2
NATIVE_OPTIMIZATIONS=ON
BUILD_EXAMPLES=OFF
MKL=ON
MKL_DIR=${MKLROOT}/lib/intel64
TRILINOS_CONFOPTS="-DBUILD_SHARED_LIBS=OFF -DTPL_FIND_SHARED_LIBS=OFF"
DEAL_CONFOPTS="-DDEAL_II_STATIC_EXECUTABLE=ON"
USE_DEAL_II_CMAKE_MPI_COMPILER=ON
DEAL_II_CONFOPTS="-DDEAL_II_WITH_COMPLEX_VALUES=OFF"

The compilation with ./candi.sh ... works but subsequent installation failed due to a missing Trilinos:

$ cmake ../
-- 
-- ====================================================
-- ===== Configuring ASPECT 2.6.0-pre
-- ====================================================
-- 
-- Setting up ASPECT for DebugRelease mode.
-- 
-- ===== Configuring external libraries ===============
-- Found deal.II version 9.5.2 at '/ccc/work/cont003/gen15329/liyida/aspect/build/deal.II-v9.5.2/lib/cmake/deal.II'
CMake Error at CMakeLists.txt:136 (message):

  -- deal.II was built without support for Trilinos!

CMake Error at CMakeLists.txt:150 (message):

  ASPECT requires a deal.II installation built with certain features enabled
  that seem to be missing (see above)!

-- Configuring incomplete, errors occurred!

Could anyone help me with this? Thanks, Yida

naliboff commented 4 weeks ago

@Yidali26 - thanks for posting and welcome to the community. As an aside in case you have not seen it, feel free to also post compiler questions on the ASPECT user forum (but here is fine as well).

Here is my initial suggestion - can you trying explicitly specifying the Trilinos directory location in candi.cfg with -DTRILINOS_DIR=/path/to/trilinos?

For reference, here is a link to the deal.II CMake documentation page, which is a very useful reference if you need to specify more options than normal in the candi.cfg file.

Yidali26 commented 4 weeks ago

Hi @naliboff, Thanks for your reply. Do I need to add trilinos back to the .cfg file? Previously I removed trilino in the local.cfg file since its installation will lead to some errors. It attempts to unpack and install using the uploaded file, but would eventually fail because of some cmake or mpi error. Is there any way to override it and force it to use the existing module instead of install trilinos itself? Here I tried your suggestion to add in -DTRILINOS_DIR=/path/to/trilinos and here is the local.cfg:

PACKAGES="load:dealii-prepare once:cmake once:astyle once:hdf5 once:netcdf once:sundials once:p4est once:trilinos dealii"

DEAL_II_VERSION=v9.5.2
NATIVE_OPTIMIZATIONS=ON
BUILD_EXAMPLES=OFF
MKL=ON
MKL_DIR=${MKLROOT}/lib/intel64
TRILINOS_CONFOPTS="-DBUILD_SHARED_LIBS=OFF -DTPL_FIND_SHARED_LIBS=OFF -DTRILINOS_DIR=/ccc/products/trilinos-14.4.0/gcc--8.3.0__openmpi--4.0.1/default/"
DEAL_CONFOPTS="-DDEAL_II_STATIC_EXECUTABLE=ON"
USE_DEAL_II_CMAKE_MPI_COMPILER=ON
DEAL_II_CONFOPTS="-DDEAL_II_WITH_COMPLEX_VALUES=OFF"

But with this it still attempt to install trilinos from the .tar.gz file. Thanks, Yida

naliboff commented 4 weeks ago

But with this it still attempt to install trilinos from the .tar.gz file.

@Yidali26 - I think that makes sense, as once:trilinos is still included in PACKAGES="..."

Can you remove once:trilinos from that line and try specifying the following?

-DDEAL_II_WITH_TRILINOS=ON 
-DTRILINOS_DIR=/path/to/trilinos
Yidali26 commented 4 weeks ago

Hi @naliboff , Thanks, I tried so but still doesn't work. After removing once:trilinos candi doesn't install trilinos at all, and subsequent cmake .. for aspect fail with the same reason(no trilino):

Build finished in 2 seconds.

Summary of timings:

dealii-prepare: 0 s
cmake: 0 s
astyle: 0 s
hdf5: 0 s
netcdf: 0 s
sundials: 0 s
p4est: 0 s
dealii: 0 s

Here is the local.cfg file:

PACKAGES="load:dealii-prepare once:cmake once:astyle once:hdf5 once:netcdf once:sundials once:p4est once:dealii"

DEAL_II_VERSION=v9.5.2
NATIVE_OPTIMIZATIONS=ON
BUILD_EXAMPLES=OFF
MKL=ON
MKL_DIR=${MKLROOT}/lib/intel64
TRILINOS_CONFOPTS="-DBUILD_SHARED_LIBS=OFF -DTPL_FIND_SHARED_LIBS=OFF -DDEAL_II_WITH_TRILINOS=ON -DTRILINOS_DIR=/ccc/products/trilinos-14.4.0/gcc--8.3.0__openmpi--4.0.1/default/"
DEAL_CONFOPTS="-DDEAL_II_STATIC_EXECUTABLE=ON"
USE_DEAL_II_CMAKE_MPI_COMPILER=ON
DEAL_II_CONFOPTS="-DDEAL_II_WITH_COMPLEX_VALUES=OFF"

Thanks, Yida

naliboff commented 3 weeks ago

@Yidali26 - were you able to find a solution for compiling Trilinos? If not, one possibility would be to manually specify TRILINOS_DIR in dealii.package?

Yidali26 commented 3 weeks ago

@Yidali26 - were you able to find a solution for compiling Trilinos? If not, one possibility would be to manually specify TRILINOS_DIR in dealii.package?

@naliboff Not yet. May I know the detail about how to manually specify TRILINOS_DIR in [dealii.package]? Not I get stuck at the step of installing aspect. Every time I ran cmake .., it reported the error that Trilinos is missing. Thanks, Yida

naliboff commented 3 weeks ago

@Yidali26 - sure thing, starting on line 181 of that file (contained within candi) is the following code snippet:

if [ ! -z "${TRILINOS_DIR}" ]; then
    cecho ${INFO} "deal.II: configuration with TRILINOS"
    CONFOPTS="${CONFOPTS} \
      -D DEAL_II_WITH_TRILINOS:BOOL=ON \
      -D TRILINOS_DIR=${TRILINOS_DIR}"
fi

My thought was to replace ${TRILINOS_DIR}with the path of the existing version of Trilinos on the system. I don't know why specifying DTRILINOS_DIR=/path/to/trilinos in candi.cfg would not have the same result, but maybe the above is worth a shot?

Yidali26 commented 3 weeks ago

Hi @naliboff I started over the installation, and oddly enough, with the same modules and installation files I got a different error. This time it stuck at configuring hdf5:

checking whether we are cross compiling... configure: error: in `/ccc/work/cont003/gen15329/liyida/aspect/build/tmp/build/hdf5-1.12.2':
configure: error: cannot run C compiled programs.
If you meant to cross compile, use `--host'.
See `config.log' for more details
Failure with exit status: 1
Exit message: There was a problem configuring hdf5 1.12.2.

Considering the complication of the native build on this computer, I'm going to try the containerization. Hopefully this will work and wouldn't affect the performance of the computation. I'll keep you updated on this issue. Thanks a lot! Yida

gassmoeller commented 3 weeks ago

Hi @Yidali26, It sounds like you are following all the steps of all sections in https://github.com/geodynamics/aspect/wiki/Installation-FAQ, however they are meant as a list of different things to do or not do depending on your cluster and situation. I would suggest you start from the following local.cfg and only modify if you need to, in particular the section on static executables is rarely needed:

# I HAVE REMOVED THE FOLLOWING FROM PACKAGES BECAUSE THEY ARE OPTIONAL. REENABLE IF YOU NEED SUPPORT FOR HDF5 or NETCDF FILE FORMATS
# once:hdf5 once:netcdf

PACKAGES="load:dealii-prepare once:cmake once:astyle once:sundials once:p4est once:dealii" 

DEAL_II_VERSION=v9.6.0
NATIVE_OPTIMIZATIONS=ON
BUILD_EXAMPLES=OFF

# !!!! ONlY KEEP THESE LINES IF YOUR CLUSTER USES MKL
MKL=ON
MKL_DIR=${MKLROOT}/lib/intel64

# THIS SHOULD ENABLE DEAL.II TO FIND TRILINOS
TRILINOS_DIR="/ccc/products/trilinos-14.4.0/gcc--8.3.0__openmpi--4.0.1/default/"

# YOU CAN EXPERIMENT WITH COMMENTING THIS LINE IF YOU ENCOUNTER COMPILER ERRORS
USE_DEAL_II_CMAKE_MPI_COMPILER=ON

DEAL_II_CONFOPTS="-DDEAL_II_WITH_COMPLEX_VALUES=OFF"
Yidali26 commented 3 weeks ago

Hi @gassmoeller, Thanks. I tried your local.cfg, and it fails again at p4est:

Report bugs to <which-bugs@gnu.org>.
/ccc/work/cont003/gen15329/liyida/aspect/build/tmp/unpack/p4est-2.3.2/configure: line 4056: test: argument expected
configure: error: in `/ccc/work/cont003/gen15329/liyida/aspect/build/tmp/build/p4est-2.3.2/FAST':
configure: error: cannot run Fortran 77 compiled programs.
If you meant to cross compile, use `--host'.
See `config.log' for more details
Error: Error in configure

It's odd since it works previously. By the way, the cluster do use MKL. Thanks, Yida

gassmoeller commented 3 weeks ago

If something worked previously and it doesnt anymore it is worth trying to remove the whole build directory /ccc/work/cont003/gen15329/liyida/aspect/build/ and start again. Based on the error message (configure: error: cannot run Fortran 77 compiled programs.) it seems like the fortran compiler is not working. Have you checked you are using the same compiler suite for C, C++ and Fortran compilers?

tjhei commented 3 weeks ago

A few comments: 1) Generally, I have had little success using any pre-installed packages during the configuration/compilation of deal.II (for example Trilinos or p4est) and I always compile things from scratch with candi. 2) Like Rene said, avoid static executables unless you are 100% sure you need them. 3) Are you compiling on a compute node or the login node? I see the system has different partitions/architectures, so you need to be careful about where you configure/compile and run. That's why I always configure/compile from scratch on the compute nodes I actually want to run on.

Yidali26 commented 3 weeks ago

@gassmoeller I tried to remove and redo the whole build again, and the same issue persist.

@tjhei Yes I started with the compile from scratch, but there were different kinds of errors stopping me from installing Trilinos, like below:

-- Performing Test FINITE_VALUE_HAVE_STD_ISINF - Success
 -- Found Doxygen: /usr/bin/doxygen (found version "1.8.14") found
components: doxygen dot

Getting information for all enabled external packages/TPLs ...

Processing enabled external package/TPL: MPI (enabled explicitly,
disable with -DTPL_ENABLE_MPI=OFF)
-- MPI_LIBRARY_NAMES=''
Processing enabled external package/TPL: BLAS (enabled by MueLu,
disable
with -DTPL_ENABLE_BLAS=OFF)
-- BLAS_LIBRARY_NAMES='blas blas_win32'
-- Searching for libs in BLAS_LIBRARY_DIRS=''
-- Searching for a lib in the set "blas blas_win32":
--   Searching for lib 'blas' ...
--   Searching for lib 'blas_win32' ...
-- NOTE: Did not find a lib in the lib set "blas blas_win32" for the
TPL
'BLAS'!
-- ERROR: Could not find the libraries for the TPL 'BLAS'!
-- TIP: If the TPL 'BLAS' is on your system then you can set:
     -DBLAS_LIBRARY_DIRS='<dir0>;<dir1>;...'
   to point to the directories where these libraries may be found.
   Or, just set:
-DTPL_BLAS_LIBRARIES='<path-to-libs0>;<path-to-libs1>;...'
   to point to the full paths for the libraries which will
   bypass any search for libraries and these libraries will be used
without
   question in the build.  (But this will result in a build-time error
   if not all of the necessary symbols are found.)
-- ERROR: Failed finding all of the parts of TPL 'BLAS' (see above),
Aborting!

-- NOTE: The find module file for this failed TPL 'BLAS' is:
/ccc/work/cont003/gen15329/liyida/aspect/build/tmp/unpack/Trilinos-
trilinos-release-14-4-0/cmake/TPLs/FindTPLBLAS.cmake
    which is pointed to in the file:
/ccc/work/cont003/gen15329/liyida/aspect/build/tmp/unpack/Trilinos-
trilinos-release-14-4-0/TPLsList.cmake

TIP: One way to get past the configure failure for the
TPL 'BLAS' is to simply disable it with:
   -DTPL_ENABLE_BLAS=OFF

I mostly configure and compile on the computer node, though I might forget to move over to computer node sometime.

Anyway, I'll try my luck on the docker containerization since the native installation seems not to be easy on this computer. Thanks for your responses! Yida

tjhei commented 3 weeks ago

though I might forget to move over to computer node sometime.

then that might break your build and you need to start over.

ERROR: Failed finding all of the parts of TPL 'BLAS' (see above),

You are missing BLAS. You need to enable a BLAS package on the system, point candi to it correctly, or install "openblas" through candi.

Yidali26 commented 1 week ago

Hello, I have the ASPECT installed with containerization on the supercomputer. Thanks everyone for helping. BTW, I found the calculation result from the container with 2.6.0 is significantly different from the result from my workstation(native installation ASPECT 2.6.0). However, the container with 2.5.0 produce the same result as my workstation's result. Seems like there are some change potentially affect the calculation. Yida