enrico-dev / enrico

ENRICO: Exascale Nuclear Reactor Investigative COde
https://enrico-docs.readthedocs.io
BSD 3-Clause "New" or "Revised" License
63 stars 26 forks source link

[Enrico-Nekrs] compile with gcc-9 and gcc-10 #117

Closed AI-Pranto closed 3 years ago

AI-Pranto commented 3 years ago

Got the same error

[100%] Linking CXX executable enrico
/usr/bin/ld: libenrico.a(nekrs_driver.cpp.o): undefined reference to symbol 'dlclose@@GLIBC_2.2.5'
/usr/bin/ld: /lib/x86_64-linux-gnu/libdl.so.2: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
make[3]: *** [CMakeFiles/enrico.dir/build.make:116: enrico] Error 1
make[2]: *** [CMakeFiles/Makefile2:913: CMakeFiles/enrico.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:920: CMakeFiles/enrico.dir/rule] Error 2
make: *** [Makefile:235: enrico] Error 2

with gcc-9.3 and gcc-10.2. The following error from travis CI with gcc-9 and distro: ubuntu-focal

[ 98%] Building CXX object vendor/nekRS/CMakeFiles/nekrs-bin.dir/src/main.cpp.o
/home/travis/build/AI-Pranto/enrico/vendor/nekRS/src/main.cpp: In function ‘cmdOptions* processCmdLineOptions(int, char**)’:
/home/travis/build/AI-Pranto/enrico/vendor/nekRS/src/main.cpp:252:36: warning: ignoring return value of ‘int chdir(const char*)’, declared with attribute warn_unused_result [-Wunused-result]
  252 |     if(casepath.length() > 0) chdir(casepath.c_str());
      |                               ~~~~~^~~~~~~~~~~~~~~~~~
[ 98%] Linking CXX executable nekrs
/usr/bin/mpicxx   -fopenmp -O3 -DNDEBUG    CMakeFiles/nekrs-bin.dir/src/main.cpp.o  -o nekrs  -Wl,-rpath,/home/travis/build/AI-Pranto/enrico/tests/singlerod/short/build/vendor/nekRS:/home/travis/build/AI-Pranto/enrico/tests/singlerod/short/build/lib:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: libnekrs.so ../../lib/libocca.so 
[ 98%] Built target nekrs-bin
Scanning dependencies of target nek5000_deps
[ 98%] Creating directories for 'nek5000_deps'
[ 99%] No download step for 'nek5000_deps'
[ 99%] No patch step for 'nek5000_deps'
[ 99%] No update step for 'nek5000_deps'
[ 99%] No configure step for 'nek5000_deps'
[ 99%] Performing build step for 'nek5000_deps'

ERROR: Cannot find a supported compiler!

travis CI

RonRahaman commented 3 years ago

Good morning! Re: the second error, it's likely that Nek5000 (part of the NekRS backend) is mis-identifying your Fortran compiler. I've seen this issue with MPICH on recent versions of Ubuntu, but it could be a pretty easy fix. Try defining the env variable MPICH_FC=gfortran in your .travis-ci.yml:

env:
  global:
    - HDF5_ROOT=/usr
    - OMP_NUM_THREADS=2
    - OPENMC_CROSS_SECTIONS=$HOME/endfb71_hdf5/cross_sections.xml  
    - MPICH_FC=gfortran
RonRahaman commented 3 years ago

Re: the first error, it looks the linker isn't locating libdl properly. On the system that you're using, it might be necessary to be more explicit about it. Try using CMAKE_DL_LIBS in the CMakeLists. Right now, you should see:

https://github.com/enrico-dev/enrico/blob/0974711107b5a4ad794d5e042f9f4c03685434a5/CMakeLists.txt#L195

Change it so it says:

target_link_libraries(libenrico PUBLIC ${LIBRARIES} ${CMAKE_DL_LIBS})
AI-Pranto commented 3 years ago

Thanks, @RonRahaman.

Last Friday, I didn't see the first error while compiling with gcc-9 on ubuntu-20.04.

RonRahaman commented 3 years ago

Yeah, the first error appeared pretty suddenly for us, too. The issue is that Nek5000's legacy build system uses the -show flag to infer the identity of the Fortran compiler. So now, MPICH on Ubuntu uses f95, which Nek5000 doesn't identify as gfortran, even though it is an alias for gfortran on our systems.

$ mpifort -show
f95 -Wl,-Bsymbolic-functions -Wl,-z,relro -I/usr/include/x86_64-linux-gnu/mpich -I/usr/include/x86_64-linux-gnu/mpich -L/usr/lib/x86_64-linux-gnu -lmpichfort -lmpich
$ f95 --version
GNU Fortran (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

This might have been a recent change with the MPICH package from Ubuntu's APT. Before, it probably used gfortran.

AI-Pranto commented 3 years ago
$ mpirun -np 2 ../build/install/bin/enrico
Minimum neutron data temperature: 294.0 K
 Maximum neutron data temperature: 1200.0 K
 Preparing distributed cell instances...
 Writing summary.h5 file...
[pranto-Lenovo-Z40-70:49208] *** Process received signal ***
[pranto-Lenovo-Z40-70:49208] Signal: Segmentation fault (11)
[pranto-Lenovo-Z40-70:49208] Signal code: Address not mapped (1)
[pranto-Lenovo-Z40-70:49208] Failing at address: (nil)
[pranto-Lenovo-Z40-70:49209] *** Process received signal ***
[pranto-Lenovo-Z40-70:49209] Signal: Segmentation fault (11)
[pranto-Lenovo-Z40-70:49209] Signal code: Address not mapped (1)
[pranto-Lenovo-Z40-70:49209] Failing at address: (nil)
[pranto-Lenovo-Z40-70:49208] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x46210)[0x7f812354f210]
[pranto-Lenovo-Z40-70:49208] [ 1] [pranto-Lenovo-Z40-70:49209] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x18b675)[0x7f8123694675]
[pranto-Lenovo-Z40-70:49208] [ 2] /lib/x86_64-linux-gnu/libc.so.6(+0x46210)[0x7ff283adf210]
[pranto-Lenovo-Z40-70:49209] [ 1] /home/pranto/github/enrico/tests/singlerod/short/build/install/lib/libnekrs.so(_Z10configReadP19ompi_communicator_t+0x88)[0x7f8123c81578]
[pranto-Lenovo-Z40-70:49208] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x18b675)[0x7ff283c24675]
[pranto-Lenovo-Z40-70:49209] [ 2] /home/pranto/github/enrico/tests/singlerod/short/build/install/lib/libnekrs.so(_ZN5nekrs5setupEP19ompi_communicator_tiiiNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES7_S7_S7_+0x9e)[0x7f8123c23e4e]
[pranto-Lenovo-Z40-70:49208] [ 4] /home/pranto/github/enrico/tests/singlerod/short/build/install/lib/libnekrs.so(_Z10configReadP19ompi_communicator_t+0x88)[0x7ff284211578]
[pranto-Lenovo-Z40-70:49209] [ 3] /home/pranto/github/enrico/tests/singlerod/short/build/install/lib/libnekrs.so(_ZN5nekrs5setupEP19ompi_communicator_tiiiNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES7_S7_S7_+0x9e)[0x7ff2841b3e4e]
[pranto-Lenovo-Z40-70:49209] [ 4] ../build/install/bin/enrico(_ZN6enrico11NekRSDriverC1EP19ompi_communicator_tN4pugi8xml_nodeE+0x26d)[0x563c4738b0ed]
[pranto-Lenovo-Z40-70:49208] [ 5] ../build/install/bin/enrico(_ZN6enrico11NekRSDriverC1EP19ompi_communicator_tN4pugi8xml_nodeE+0x26d)[0x561900ec00ed]
[pranto-Lenovo-Z40-70:49209] [ 5] ../build/install/bin/enrico(_ZN6enrico13CoupledDriverC1EP19ompi_communicator_tN4pugi8xml_nodeE+0xafa)[0x563c4734f87a]
[pranto-Lenovo-Z40-70:49208] [ 6] ../build/install/bin/enrico(main+0x13e)[0x563c4734232e]
[pranto-Lenovo-Z40-70:49208] [ 7] ../build/install/bin/enrico(_ZN6enrico13CoupledDriverC1EP19ompi_communicator_tN4pugi8xml_nodeE+0xafa)[0x561900e8487a]
[pranto-Lenovo-Z40-70:49209] [ 6] ../build/install/bin/enrico(main+0x13e)[0x561900e7732e]
[pranto-Lenovo-Z40-70:49209] [ 7] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f81235300b3]
[pranto-Lenovo-Z40-70:49208] [ 8] ../build/install/bin/enrico(_start+0x2e)[0x563c4734272e]
[pranto-Lenovo-Z40-70:49208] *** End of error message ***
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7ff283ac00b3]
[pranto-Lenovo-Z40-70:49209] [ 8] ../build/install/bin/enrico(_start+0x2e)[0x561900e7772e]
[pranto-Lenovo-Z40-70:49209] *** End of error message ***
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node pranto-Lenovo-Z40-70 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

➜ ../build/install/bin/nrspre rod_short 2
                 __    ____  _____
   ____   ___   / /__ / __ \/ ___/
  / __ \ / _ \ / //_// /_/ /\__ \ 
 / / / //  __// ,<  / _, _/___/ / 
/_/ /_/ \___//_/|_|/_/ |_|/____/  v20.1

COPYRIGHT (c) 2019-2020 UCHICAGO ARGONNE, LLC

MPI tasks: 1

using OCCA_CACHE_DIR: /home/pranto/github/enrico/tests/singlerod/short/openmc_nekrs/.cache/occa/

Initializing device
active occa mode: Serial

performing dry-run for 2 MPI ranks ...

building udf ... cp: cannot stat '../build/install/bin/..//udf/CMakeLists.txt': No such file or directory

An ERROR occured, see /home/pranto/github/enrico/tests/singlerod/short/openmc_nekrs/.cache/udf/build.log for details!
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.

@RonRahaman I am using OpenMPI-4.0.3.

RonRahaman commented 3 years ago

Thanks a lot for that backtrace! Do you have the environment variable NEKRS_HOME defined? If not, define it as:

  export NEKRS_HOME=$(realpath install)

or whatever your install directory is. It's in the README.md, in case you forget.

When nekRS gets NEKRS_HOME from the env, it doesn't check to see if the ptr is non-NULL before it uses it to instantiate a string. There's a fix in the nekRS upstream, but looks like we haven't merged it into our fork yet.

AI-Pranto commented 3 years ago

@RonRahaman, I didn't take NEKRS_HOME Path variable seriously because I can call executable explicitly.

Thanks @RonRahaman again :)

RonRahaman commented 3 years ago

No problem! I can understand that reaction :)

AI-Pranto commented 3 years ago

Sorry for taking so much time.

Successfully running on my old laptop. @RonRahaman you saved my time. Now I can focus on my exam.

I'm going to close this issue now.