boutproject / BOUT-dev

BOUT++: Plasma fluid finite-difference simulation code in curvilinear coordinate systems
http://boutproject.github.io/
GNU Lesser General Public License v3.0
182 stars 95 forks source link

Fedora build on Travis failing with PETSc error #2073

Closed ZedThree closed 4 years ago

ZedThree commented 4 years ago

Lots of PRs currently failing due to the Fedora build on Travis. Weirdly in parseCommandLineArgs, so should have nothing to do with PETSc. @dschwoerer any ideas?

[ RUN      ] ParseCommandLineArgsDeathTest.HelpShortOption
src/test_bout++.cxx:32: Failure
Death test: bout::experimental::parseCommandLineArgs(c_args.size(), argv)
    Result: died but not with expected exit code:
            Exited with exit status 15
Actual msg:
[  DEATH   ] Usage: test [-d <data directory>] [-f <options filename>] [restart [append]] [VAR=VALUE]
[  DEATH   ] 
[  DEATH   ]   -d <data directory>      Look in <data directory> for input/output files
[  DEATH   ]   -f <options filename>        Use OPTIONS given in <options filename>
[  DEATH   ]   -o <settings filename>   Save used OPTIONS given to <options filename>
[  DEATH   ]   -l, --log <log filename> Print log to <log filename>
[  DEATH   ]   -v, --verbose            Increase verbosity
[  DEATH   ]   -q, --quiet          Decrease verbosity
[  DEATH   ]   -c, --color          Color output using bout-log-color
[  DEATH   ]   --print-config       Print the compile-time configuration
[  DEATH   ]   --list-solvers       List the available time solvers
[  DEATH   ]   --list-laplacians        List the available Laplacian inversion solvers
[  DEATH   ]   --list-laplacexz     List the available LaplaceXZ inversion solvers
[  DEATH   ]   --list-invertpars        List the available InvertPar solvers
[  DEATH   ]   --list-rkschemes     List the available Runge-Kutta schemes
[  DEATH   ]   --list-meshes            List the available Meshes
[  DEATH   ]   --list-xzinterpolations  List the available XZInterpolations
[  DEATH   ]   --list-zinterpolations   List the available ZInterpolations
[  DEATH   ]   -h, --help           This message
[  DEATH   ]   restart [append]     Restart the simulation. If append is specified, append to the existing output files, otherwise overwrite them
[  DEATH   ]   VAR=VALUE            Specify a VALUE for input parameter VAR
[  DEATH   ] 
[  DEATH   ] For all possible input parameters, see the user manual and/or the physics model source (e.g. test.cxx)
[  DEATH   ] [0]PETSC ERROR: ------------------------------------------------------------------------
[  DEATH   ] [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[  DEATH   ] [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[  DEATH   ] [0]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[  DEATH   ] [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[  DEATH   ] [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run 
[  DEATH   ] [0]PETSC ERROR: to get more information on the crash.
[  DEATH   ] [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[  DEATH   ] [0]PETSC ERROR: Signal received
[  DEATH   ] [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[  DEATH   ] [0]PETSC ERROR: Petsc Release Version 3.13.4, Aug 01, 2020 
[  DEATH   ] [0]PETSC ERROR: Unknown Name on a  named 9d87ee057f91 by test Fri Aug  7 17:28:39 2020
[  DEATH   ] [0]PETSC ERROR: Configure options --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --with-dependency-tracking=0 --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --FC_LINKER_FLAGS="-Wl,-z,relro -Wl,--as-needed  -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -lgfortran -lmpifort" --LIBS=" -lmpifort" CFLAGS="-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -O3 -fPIC" CXXFLAGS="-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -O3 -fPIC" FFLAGS="-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -I/usr/lib64/gfortran/modules -O3 -fPIC" LDFLAGS="-Wl,-z,relro -Wl,--as-needed  -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -fPIC" COPTFLAGS="-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection" CXXOPTFLAGS="-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection" FOPTFLAGS="-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -I/usr/lib64/gfortran/modules" FCFLAGS="-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -I/usr/lib64/gfortran/modules -O3 -fPIC" --CC_LINKER_FLAGS="-Wl,-z,relro -Wl,--as-needed  -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld" --with-default-arch=0 --with-make=1 --with-cmake-exec=/usr/bin/cmake3 --with-ctest-exec=/usr/bin/ctest3 --with-single-library=1 --with-precision=double --with-petsc-arch=x86_64 --with-clanguage=C --with-shared-libraries=1 --with-fortran-interfaces=1 --with-windows-graphics=0 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif77 --with-shared-ld=ld --with-pic=1 --with-clib-autodetect=0 --with-fortranlib-autodetect=0 --with-cxxlib-autodetect=0 --with-threadsafety=0 --with-log=1 --with-debugging=0 --with-scalapack=1 --with-scalapack-lib="-L/usr/lib64/mpich/lib -lscalapack" --with-scalapck-include= --with-mpi=1 --with-cgns=1 --with-cgns-include= --with-cgns-lib=-lcgns --with-hdf5=1 --with-hdf5-include= --with-hdf5-lib="-L/usr/lib64/mpich/lib -lhdf5 -lhdf5_hl" --with-ptscotch=1 --with-ptscotch-include= --with-ptscotch-lib="-L/usr/lib64/mpich/lib -lptscotch -lscotch -lptscotcherr -lscotcherr" --with-mumps=1 --with-metis=1 --with-superlu_dist=1 --with-superlu_dist-include=/usr/include/mpich-x86_64/superlu_dist --with-superlu_dist-lib=-lsuperlu_dist --with-x=1 --with-openmp=0 --with-hwloc=0 --with-ssl=0 --with-hypre=1 --with-hypre-include=/usr/include/mpich-x86_64/hypre --with-hypre-lib="-L/usr/lib64/mpich/lib -lHYPRE" --with-pthread=1 --with-valgrind=1 --with-64-bit-indices=0 --with-blaslapack-lib=-lopenblasp --known-64-bit-blas-indices=0
[  DEATH   ] [0]PETSC ERROR: #1 User provided function() line 0 in  unknown file
[  DEATH   ] application called MPI_Abort(comm=0x84000000, 50162059) - process 0
[  DEATH   ] [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=50162059
[  DEATH   ] :
[  DEATH   ] system msg for write_line failure : Bad file descriptor
[  DEATH   ] 
[  FAILED  ] ParseCommandLineArgsDeathTest.HelpShortOption (11 ms)
ZedThree commented 4 years ago

I can reproduce this locally with the .travis_fedora.sh script, so it's not a Travis thing. I've not working out how to get a debugger on it yet, as it makes a container.

Also, I can't reproduce this on a Fedora 32 machine (as opposed to 33) using PETSc 3.13.0 local build (as opposed to 3.13.4 system build).

Looks like Fedora updated to PETSc 3.13.4 a couple of weeks ago, which might be it. I'll try building that version of PETSc locally.

dschwoerer commented 4 years ago

For running gdb within a container, I normally use:

podman run --rm --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -it registry.fedoraproject.org/fedora:rawhide

I'll have a look. Recently there was also a mass-rebuild, where LTO got enabled, but I only had issues on s390x architecture, not on x86_64 ...

ZedThree commented 4 years ago

Thanks! I was literally just struggling with that!

I spotted a commit message about mass rebuild, but didn't know what that was about. I might try LTO natively, not in a container.

dschwoerer commented 4 years ago

No worries, I have a cannot remember it either, I have an alias for that :-)

Mass rebuild is where every package is rebuild to ensure they are still building from source, and also ensures that all packages are build with the most recent flags/compilers.

The issues might also be related to flexiblas, a wrapper for blas that allows to switch blas implementations at runtime ...

dschwoerer commented 4 years ago

Could be caused by https://github.com/mpimd-csc/flexiblas/issues/1

ZedThree commented 4 years ago

Thanks @dschwoerer ! Glad it isn't us. I don't think there's really an easy/nice way to disable the fedora job just for a few PRs. I'll try to keep an eye on that issue and then rerun the fedora job when it's in.

ZedThree commented 4 years ago

Looks like the fix in #2079 failed when it merged into next: https://travis-ci.org/github/boutproject/BOUT-dev/jobs/718552571#L587

warning: /var/cache/dnf/rawhide-2d95c80a1fa0a67d/packages/MUMPS-common-5.3.3-1.fc33.noarch.rpm: Header V4 RSA/SHA256 Signature, key ID 45719a39: NOKEY
Fedora - Rawhide - Developmental packages for t 1.6 MB/s | 1.6 kB     00:00    
GPG key at file:///etc/pki/rpm-gpg/RPM-GPG-KEY-fedora-33-x86_64 (0x9570FF31) is already installed
The GPG keys listed for the "Fedora - Rawhide - Developmental packages for the next Fedora release" repository are already installed but they are not correct for this package.
Check that the correct key URLs are configured for this repository.. Failing package is: MUMPS-common-5.3.3-1.fc33.noarch
 GPG Keys are configured as: file:///etc/pki/rpm-gpg/RPM-GPG-KEY-fedora-33-x86_64
Public key for MUMPS-mpich-5.3.3-1.fc33.x86_64.rpm is not installed. Failing package is: MUMPS-mpich-5.3.3-1.fc33.x86_64
 GPG Keys are configured as: file:///etc/pki/rpm-gpg/RPM-GPG-KEY-fedora-33-x86_64
...
Public key for libxcrypt-4.4.16-7.fc34.x86_64.rpm is not installed. Failing package is: libxcrypt-4.4.16-7.fc34.x86_64
 GPG Keys are configured as: file:///etc/pki/rpm-gpg/RPM-GPG-KEY-fedora-33-x86_64
Public key for zlib-1.2.11-22.fc33.x86_64.rpm is not installed. Failing package is: zlib-1.2.11-22.fc33.x86_64
 GPG Keys are configured as: file:///etc/pki/rpm-gpg/RPM-GPG-KEY-fedora-33-x86_64
Error: GPG check FAILED

Might be a Travis cache problem? Is there a way to force install those keys?

dschwoerer commented 4 years ago

This seems to be fixed in next, do we want backports for 4.x and and 4.3.x ?

ZedThree commented 4 years ago

It's probably worth backporting the Fedora job to 4.4, probably not to 4.3, just because I want to release 4.4 soon.

ZedThree commented 4 years ago

Also, forgot to say, thanks for sorting this out! :tada: