Closed AI-Pranto closed 3 years ago
Good morning! Re: the second error, it's likely that Nek5000 (part of the NekRS backend) is mis-identifying your Fortran compiler. I've seen this issue with MPICH on recent versions of Ubuntu, but it could be a pretty easy fix. Try defining the env variable MPICH_FC=gfortran
in your .travis-ci.yml:
env:
global:
- HDF5_ROOT=/usr
- OMP_NUM_THREADS=2
- OPENMC_CROSS_SECTIONS=$HOME/endfb71_hdf5/cross_sections.xml
- MPICH_FC=gfortran
Re: the first error, it looks the linker isn't locating libdl
properly. On the system that you're using, it might be necessary to be more explicit about it. Try using CMAKE_DL_LIBS in the CMakeLists. Right now, you should see:
Change it so it says:
target_link_libraries(libenrico PUBLIC ${LIBRARIES} ${CMAKE_DL_LIBS})
Thanks, @RonRahaman.
Last Friday, I didn't see the first error while compiling with gcc-9 on ubuntu-20.04
.
Yeah, the first error appeared pretty suddenly for us, too. The issue is that Nek5000's legacy build system uses the -show
flag to infer the identity of the Fortran compiler. So now, MPICH on Ubuntu uses f95
, which Nek5000 doesn't identify as gfortran, even though it is an alias for gfortran on our systems.
$ mpifort -show
f95 -Wl,-Bsymbolic-functions -Wl,-z,relro -I/usr/include/x86_64-linux-gnu/mpich -I/usr/include/x86_64-linux-gnu/mpich -L/usr/lib/x86_64-linux-gnu -lmpichfort -lmpich
$ f95 --version
GNU Fortran (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
This might have been a recent change with the MPICH package from Ubuntu's APT. Before, it probably used gfortran
.
$ mpirun -np 2 ../build/install/bin/enrico
Minimum neutron data temperature: 294.0 K
Maximum neutron data temperature: 1200.0 K
Preparing distributed cell instances...
Writing summary.h5 file...
[pranto-Lenovo-Z40-70:49208] *** Process received signal ***
[pranto-Lenovo-Z40-70:49208] Signal: Segmentation fault (11)
[pranto-Lenovo-Z40-70:49208] Signal code: Address not mapped (1)
[pranto-Lenovo-Z40-70:49208] Failing at address: (nil)
[pranto-Lenovo-Z40-70:49209] *** Process received signal ***
[pranto-Lenovo-Z40-70:49209] Signal: Segmentation fault (11)
[pranto-Lenovo-Z40-70:49209] Signal code: Address not mapped (1)
[pranto-Lenovo-Z40-70:49209] Failing at address: (nil)
[pranto-Lenovo-Z40-70:49208] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x46210)[0x7f812354f210]
[pranto-Lenovo-Z40-70:49208] [ 1] [pranto-Lenovo-Z40-70:49209] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x18b675)[0x7f8123694675]
[pranto-Lenovo-Z40-70:49208] [ 2] /lib/x86_64-linux-gnu/libc.so.6(+0x46210)[0x7ff283adf210]
[pranto-Lenovo-Z40-70:49209] [ 1] /home/pranto/github/enrico/tests/singlerod/short/build/install/lib/libnekrs.so(_Z10configReadP19ompi_communicator_t+0x88)[0x7f8123c81578]
[pranto-Lenovo-Z40-70:49208] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x18b675)[0x7ff283c24675]
[pranto-Lenovo-Z40-70:49209] [ 2] /home/pranto/github/enrico/tests/singlerod/short/build/install/lib/libnekrs.so(_ZN5nekrs5setupEP19ompi_communicator_tiiiNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES7_S7_S7_+0x9e)[0x7f8123c23e4e]
[pranto-Lenovo-Z40-70:49208] [ 4] /home/pranto/github/enrico/tests/singlerod/short/build/install/lib/libnekrs.so(_Z10configReadP19ompi_communicator_t+0x88)[0x7ff284211578]
[pranto-Lenovo-Z40-70:49209] [ 3] /home/pranto/github/enrico/tests/singlerod/short/build/install/lib/libnekrs.so(_ZN5nekrs5setupEP19ompi_communicator_tiiiNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES7_S7_S7_+0x9e)[0x7ff2841b3e4e]
[pranto-Lenovo-Z40-70:49209] [ 4] ../build/install/bin/enrico(_ZN6enrico11NekRSDriverC1EP19ompi_communicator_tN4pugi8xml_nodeE+0x26d)[0x563c4738b0ed]
[pranto-Lenovo-Z40-70:49208] [ 5] ../build/install/bin/enrico(_ZN6enrico11NekRSDriverC1EP19ompi_communicator_tN4pugi8xml_nodeE+0x26d)[0x561900ec00ed]
[pranto-Lenovo-Z40-70:49209] [ 5] ../build/install/bin/enrico(_ZN6enrico13CoupledDriverC1EP19ompi_communicator_tN4pugi8xml_nodeE+0xafa)[0x563c4734f87a]
[pranto-Lenovo-Z40-70:49208] [ 6] ../build/install/bin/enrico(main+0x13e)[0x563c4734232e]
[pranto-Lenovo-Z40-70:49208] [ 7] ../build/install/bin/enrico(_ZN6enrico13CoupledDriverC1EP19ompi_communicator_tN4pugi8xml_nodeE+0xafa)[0x561900e8487a]
[pranto-Lenovo-Z40-70:49209] [ 6] ../build/install/bin/enrico(main+0x13e)[0x561900e7732e]
[pranto-Lenovo-Z40-70:49209] [ 7] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f81235300b3]
[pranto-Lenovo-Z40-70:49208] [ 8] ../build/install/bin/enrico(_start+0x2e)[0x563c4734272e]
[pranto-Lenovo-Z40-70:49208] *** End of error message ***
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7ff283ac00b3]
[pranto-Lenovo-Z40-70:49209] [ 8] ../build/install/bin/enrico(_start+0x2e)[0x561900e7772e]
[pranto-Lenovo-Z40-70:49209] *** End of error message ***
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node pranto-Lenovo-Z40-70 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
➜ ../build/install/bin/nrspre rod_short 2
__ ____ _____
____ ___ / /__ / __ \/ ___/
/ __ \ / _ \ / //_// /_/ /\__ \
/ / / // __// ,< / _, _/___/ /
/_/ /_/ \___//_/|_|/_/ |_|/____/ v20.1
COPYRIGHT (c) 2019-2020 UCHICAGO ARGONNE, LLC
MPI tasks: 1
using OCCA_CACHE_DIR: /home/pranto/github/enrico/tests/singlerod/short/openmc_nekrs/.cache/occa/
Initializing device
active occa mode: Serial
performing dry-run for 2 MPI ranks ...
building udf ... cp: cannot stat '../build/install/bin/..//udf/CMakeLists.txt': No such file or directory
An ERROR occured, see /home/pranto/github/enrico/tests/singlerod/short/openmc_nekrs/.cache/udf/build.log for details!
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
@RonRahaman I am using OpenMPI-4.0.3
.
Thanks a lot for that backtrace! Do you have the environment variable NEKRS_HOME
defined? If not, define it as:
export NEKRS_HOME=$(realpath install)
or whatever your install
directory is. It's in the README.md, in case you forget.
When nekRS gets NEKRS_HOME from the env, it doesn't check to see if the ptr is non-NULL before it uses it to instantiate a string. There's a fix in the nekRS upstream, but looks like we haven't merged it into our fork yet.
@RonRahaman, I didn't take NEKRS_HOME
Path variable seriously because I can call executable explicitly.
Thanks @RonRahaman again :)
No problem! I can understand that reaction :)
Sorry for taking so much time.
Successfully running on my old laptop. @RonRahaman you saved my time. Now I can focus on my exam.
I'm going to close this issue now.
Got the same error
with gcc-9.3 and gcc-10.2. The following error from travis CI with
gcc-9
anddistro: ubuntu-focal
travis CI