Closed boegel closed 3 years ago
Hi Kenneth, first of all thanks for the detailed error description! I had hoped that the CMake and Ninja build information would shed some light on the issue, but I am still unable to reproduce the issue on both Intel Xeon Gold 6230 (Cascade Lake) and AMD EPYC 7742. I don't see any difference to the build commands output by Ninja, so this is starting to sound more like tiny differences in the compiler ecosystem. Would it be possible for you to check if the issue still occurs if you build static libraries
cmake -GNinja -DBUILD_SHARED_LIBS=OFF ...
ninja omp/test/preconditioner/jacobi_kernels
and send me the generated binary (maybe even with debug symbols -g
enabled)? Then hopefully I can try to narrow down where the differences come from, and if I can reproduce it with the binary at least.
@upsj I tried doing a static build, but I'm running into some trouble there, linking errors like cannot find -lgcc_s
because of missing static libraries in the GCC installation (which I'm not sure is easy to fix).
I do have a little bit of perhaps useful information though.
All 39 tests when running omp/test/preconditioner/jacobi_kernels
pass after setting $OMP_NUM_THREADS
to 1
At one point when using OMP_NUM_THREADS=2
, I got this:
...
[ RUN ] Jacobi.OmpApplyEquivalentToRefWithDifferentBlockSize
[ OK ] Jacobi.OmpApplyEquivalentToRefWithDifferentBlockSize (7 ms)
[ RUN ] Jacobi.OmpApplyEquivalentToRef
corrupted double-linked list
Aborted (core dumped)
That's not reproducible though, it often just crashes with a segfault instead.
When running under Valgrind, the problem does not occur.
When using valgrind --leak-check=full
, I get this (not sure if that's useful at all):
==3508653== 304 bytes in 1 blocks are possibly lost in loss record 4 of 5
==3508653== at 0x4036B35: calloc (vg_replace_malloc.c:760)
==3508653== by 0x4012341: allocate_dtv (in /usr/lib64/ld-2.28.so)
==3508653== by 0x4012CD1: _dl_allocate_tls (in /usr/lib64/ld-2.28.so)
==3508653== by 0x62F7F32: pthread_create@@GLIBC_2.2.5 (in /usr/lib64/libpthread-2.28.so)
==3508653== by 0x62C964A: gomp_team_start (team.c:839)
==3508653== by 0x62C16FC: GOMP_parallel (parallel.c:169)
==3508653== by 0x417BDC7: void gko::kernels::omp::jacobi::generate<double, int>(std::shared_ptr<gko::OmpExecutor const>, gko::matrix::Csr<double, int> const*, unsigned long, unsigned int, gko::detail::remove_complex_impl<double>::type, gko::preconditioner::block_interleaved_storage_scheme<int> const&, gko::Array<gko::detail::remove_complex_impl<double>::type>&, gko::Array<gko::precision_reduction>&, gko::Array<int> const&, gko::Array<double>&) (jacobi_kernels.cpp:374)
==3508653== by 0x53E5801: call<0, 1, 2, 3, 4, 5, 6, 7, 8> (jacobi.cpp:57)
==3508653== by 0x53E5801: gko::preconditioner::jacobi::generate_operation<gko::matrix::Csr<double, int> const*, unsigned long&, unsigned int&, double&, gko::preconditioner::block_interleaved_storage_scheme<int>&, gko::Array<double>&, gko::Array<gko::precision_reduction>&, gko::Array<int>&, gko::Array<double>&>::run(std::shared_ptr<gko::OmpExecutor const>) const (jacobi.cpp:57)
==3508653== by 0x4089386: gko::detail::ExecutorBase<gko::OmpExecutor>::run(gko::Operation const&) const (executor.hpp:762)
==3508653== by 0x5406C47: gko::preconditioner::Jacobi<double, int>::generate(gko::LinOp const*) (jacobi.cpp:231)
==3508653== by 0x43E80D: gko::preconditioner::Jacobi<double, int>::Jacobi(gko::preconditioner::Jacobi<double, int>::Factory const*, std::shared_ptr<gko::LinOp const>) (jacobi.hpp:512)
==3508653== by 0x43ED8A: generate_impl (abstract_factory.hpp:246)
==3508653== by 0x43ED8A: std::unique_ptr<gko::preconditioner::Jacobi<double, int>, std::default_delete<gko::preconditioner::Jacobi<double, int> > > gko::EnableDefaultFactory<gko::preconditioner::Jacobi<double, int>::Factory, gko::preconditioner::Jacobi<double, int>, gko::preconditioner::Jacobi<double, int>::parameters_type, gko::LinOpFactory>::generate<std::shared_ptr<gko::matrix::Csr<double, int> >&>(std::shared_ptr<gko::matrix::Csr<double, int> >&) const (abstract_factory.hpp:165)
Same problem occurs when building with -O0 -g
. Here's the GDB backtrace I get with that build:
Thread 5 "jacobi_kernels" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x155551b20700 (LWP 3517893)]
0x000000000049a27d in gko::log::Logger::on<2ul, gko::Executor const*, unsigned long> (this=0x4000100010007)
at /tmp/vsc40023/easybuild_build/Ginkgo/1.3.0/GCC-10.2.0/ginkgo-1.3.0/include/ginkgo/core/log/logger.hpp:168
168 GKO_LOGGER_REGISTER_EVENT(2, free_started, const Executor *exec,
(gdb) bt
#0 0x000000000049a27d in gko::log::Logger::on<2ul, gko::Executor const*, unsigned long> (this=0x4000100010007)
at /tmp/vsc40023/easybuild_build/Ginkgo/1.3.0/GCC-10.2.0/ginkgo-1.3.0/include/ginkgo/core/log/logger.hpp:168
#1 0x0000000000494bde in gko::log::EnableLogging<gko::Executor, gko::log::Loggable>::log<2ul, gko::Executor const*, unsigned long> (this=0x56c6a0)
at /tmp/vsc40023/easybuild_build/Ginkgo/1.3.0/GCC-10.2.0/ginkgo-1.3.0/include/ginkgo/core/log/logger.hpp:547
#2 0x0000000000491ca1 in gko::Executor::free (this=0x56c6a0, ptr=0x15551c001420) at /tmp/vsc40023/easybuild_build/Ginkgo/1.3.0/GCC-10.2.0/ginkgo-1.3.0/include/ginkgo/core/base/executor.hpp:507
#3 0x00000000004bcc20 in gko::executor_deleter<double []>::operator() (this=0x15551c000ea0, ptr=0x15551c001420)
at /tmp/vsc40023/easybuild_build/Ginkgo/1.3.0/GCC-10.2.0/ginkgo-1.3.0/include/ginkgo/core/base/executor.hpp:744
#4 0x00000000004bb1b0 in std::__invoke_impl<void, gko::executor_deleter<double []>&, double*> (__f=...)
at /arcanine/scratch/gent/vo/000/gvo00002/vsc40023/easybuild_tests/RHEL8/zen2-ib/software/GCCcore/10.2.0/include/c++/10.2.0/bits/invoke.h:60
#5 0x00000000004b8036 in std::__invoke_r<void, gko::executor_deleter<double []>&, double*> (__fn=...)
at /arcanine/scratch/gent/vo/000/gvo00002/vsc40023/easybuild_tests/RHEL8/zen2-ib/software/GCCcore/10.2.0/include/c++/10.2.0/bits/invoke.h:153
#6 0x00000000004b44bc in std::_Function_handler<void (double*), gko::executor_deleter<double []> >::_M_invoke(std::_Any_data const&, double*&&) (__functor=..., __args#0=@0x155551b1f950: 0x15551c001420)
at /arcanine/scratch/gent/vo/000/gvo00002/vsc40023/easybuild_tests/RHEL8/zen2-ib/software/GCCcore/10.2.0/include/c++/10.2.0/bits/std_function.h:291
#7 0x00000000004acda9 in std::function<void (double*)>::operator()(double*) const (this=0x15551c000cf8, __args#0=0x15551c001420)
at /arcanine/scratch/gent/vo/000/gvo00002/vsc40023/easybuild_tests/RHEL8/zen2-ib/software/GCCcore/10.2.0/include/c++/10.2.0/bits/std_function.h:622
#8 0x00000000004a45bf in std::unique_ptr<double [], std::function<void (double*)> >::~unique_ptr() (this=0x15551c000cf8, __in_chrg=<optimized out>)
at /arcanine/scratch/gent/vo/000/gvo00002/vsc40023/easybuild_tests/RHEL8/zen2-ib/software/GCCcore/10.2.0/include/c++/10.2.0/bits/unique_ptr.h:612
#9 0x000000000049c4e8 in gko::Array<double>::~Array (this=0x15551c000cf0, __in_chrg=<optimized out>)
at /tmp/vsc40023/easybuild_build/Ginkgo/1.3.0/GCC-10.2.0/ginkgo-1.3.0/include/ginkgo/core/base/array.hpp:84
#10 0x000015555337e1ce in std::_Destroy<gko::Array<double> > (__pointer=0x15551c000cf0)
at /arcanine/scratch/gent/vo/000/gvo00002/vsc40023/easybuild_tests/RHEL8/zen2-ib/software/GCCcore/10.2.0/include/c++/10.2.0/bits/stl_construct.h:140
#11 0x000015555337ce7c in std::allocator_traits<gko::ExecutorAllocator<gko::Array<double> > >::_S_destroy<gko::ExecutorAllocator<gko::Array<double> >, gko::Array<double> > (__p=0x15551c000cf0)
at /arcanine/scratch/gent/vo/000/gvo00002/vsc40023/easybuild_tests/RHEL8/zen2-ib/software/GCCcore/10.2.0/include/c++/10.2.0/bits/alloc_traits.h:274
#12 0x000015555337afe8 in std::allocator_traits<gko::ExecutorAllocator<gko::Array<double> > >::destroy<gko::Array<double> > (__a=..., __p=0x15551c000cf0)
at /arcanine/scratch/gent/vo/000/gvo00002/vsc40023/easybuild_tests/RHEL8/zen2-ib/software/GCCcore/10.2.0/include/c++/10.2.0/bits/alloc_traits.h:374
#13 0x0000155553378d04 in std::_Destroy<gko::Array<double>*, gko::ExecutorAllocator<gko::Array<double> > > (warning: (Internal error: pc 0x15555338145b in read in psymtab, but not in symtab.)
first=0x15551c000cf0, last=0x15551c000df0, __alloc=...) at /arcanine/scratch/gent/vo/000/gvo00002/vsc40023/easybuild_tests/RHEL8/zen2-ib/software/GCCcore/10.2.0/include/c++/10.2.0/bits/alloc_traits.h:728
warning: (Internal error: pc 0x155553381102 in read in psymtab, but not in symtab.)
warning: (Internal error: pc 0x15555338145b in read in psymtab, but not in symtab.)
warning: (Internal error: pc 0x15555338145b in read in psymtab, but not in symtab.)
warning: (Internal error: pc 0x15555338145b in read in psymtab, but not in symtab.)
this=0x155551b1fb40, __in_chrg=
warning: (Internal error: pc 0x15555338145b in read in psymtab, but not in symtab.)
) at /tmp/vsc40023/easybuild_build/Ginkgo/1.3.0/GCC-10.2.0/ginkgo-1.3.0/omp/preconditioner/jacobi_kernels.cpp:376
And with `OMP_NUM_THREADS=2`, here's the GDB backtrace with the `-O0 -g` build:
... [ RUN ] Jacobi.OmpConjTransposedPreconditionerEquivalentToRefWithMPW [ OK ] Jacobi.OmpConjTransposedPreconditionerEquivalentToRefWithMPW (31 ms) [ RUN ] Jacobi.OmpApplyEquivalentToRefWithBlockSize32 corrupted double-linked list
Thread 1 "jacobi_kernels" received signal SIGABRT, Aborted. 0x00001555525b170f in raise () from /lib64/libc.so.6 (gdb) bt
at /arcanine/scratch/gent/vo/000/gvo00002/vsc40023/easybuild_tests/RHEL8/zen2-ib/software/GCCcore/10.2.0/include/c++/10.2.0/ext/new_allocator.h:115
at /arcanine/scratch/gent/vo/000/gvo00002/vsc40023/easybuild_tests/RHEL8/zen2-ib/software/GCCcore/10.2.0/include/c++/10.2.0/bits/alloc_traits.h:460
at /arcanine/scratch/gent/vo/000/gvo00002/vsc40023/easybuild_tests/RHEL8/zen2-ib/software/GCCcore/10.2.0/include/c++/10.2.0/bits/stl_vector.h:346
(this=0x7fffffff1dc0, __position=...) at /arcanine/scratch/gent/vo/000/gvo00002/vsc40023/easybuild_tests/RHEL8/zen2-ib/software/GCCcore/10.2.0/include/c++/10.2.0/bits/vector.tcc:440
this=0x7fffffff1dc0) at /arcanine/scratch/gent/vo/000/gvo00002/vsc40023/easybuild_tests/RHEL8/zen2-ib/software/GCCcore/10.2.0/include/c++/10.2.0/bits/vector.tcc:121
at /tmp/vsc40023/easybuild_build/Ginkgo/1.3.0/GCC-10.2.0/ginkgo-1.3.0/core/test/utils/matrix_generator.hpp:118
__f=...) at /arcanine/scratch/gent/vo/000/gvo00002/vsc40023/easybuild_tests/RHEL8/zen2-ib/software/GCCcore/10.2.0/include/c++/10.2.0/bits/stl_algo.h:3839
at /tmp/vsc40023/easybuild_build/Ginkgo/1.3.0/GCC-10.2.0/ginkgo-1.3.0/core/test/utils/matrix_generator.hpp:116
num_rhs=1, accuracy=0.10000000000000001) at /tmp/vsc40023/easybuild_build/Ginkgo/1.3.0/GCC-10.2.0/ginkgo-1.3.0/omp/test/preconditioner/jacobi_kernels.cpp:84
at /tmp/vsc40023/easybuild_build/Ginkgo/1.3.0/GCC-10.2.0/ginkgo-1.3.0/omp/test/preconditioner/jacobi_kernels.cpp:357
at /tmp/vsc40023/easybuild_build/Ginkgo/1.3.0/GCC-10.2.0/easybuild_obj/third_party/gtest/src/googletest/src/gtest.cc:2437
at /tmp/vsc40023/easybuild_build/Ginkgo/1.3.0/GCC-10.2.0/easybuild_obj/third_party/gtest/src/googletest/src/gtest.cc:2473
at /tmp/vsc40023/easybuild_build/Ginkgo/1.3.0/GCC-10.2.0/easybuild_obj/third_party/gtest/src/googletest/src/gtest.cc:5131
--Type
method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x4de5ce <testing::internal::UnitTestImpl::RunAllTests()>,
location=0x513b58 "auxiliary test code (environments or event listeners)") at /tmp/vsc40023/easybuild_build/Ginkgo/1.3.0/GCC-10.2.0/easybuild_obj/third_party/gtest/src/googletest/src/gtest.cc:2437
method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x4de5ce <testing::internal::UnitTestImpl::RunAllTests()>,
location=0x513b58 "auxiliary test code (environments or event listeners)") at /tmp/vsc40023/easybuild_build/Ginkgo/1.3.0/GCC-10.2.0/easybuild_obj/third_party/gtest/src/googletest/src/gtest.cc:2473
at /tmp/vsc40023/easybuild_build/Ginkgo/1.3.0/GCC-10.2.0/easybuild_obj/third_party/gtest/src/googletest/src/gtest.cc:4740
That is great information! The valgrind warnings should be false positives, but it looks like we are having issues with concurrent allocations, though I am not 100% sure why that would lead to a crash. Are your gcc and glibc built from source or packaged? Any special configuration options? I think with that information, I might hopefully finally be able to reproduce your build environment completely.
Oh, and could you maybe check if the same issue pops up with our develop
branch? I somehow suspect this issue might still be present in our current code base.
Now that I think about it some more, as long as we use the same version of glibc, pthreads, libgomp etc, it should be enough if you send me the the shared libraries and test executable.
@upsj See attached tarball, compiled for CentOS 7 and Intel Cascade Lake:
$ tar xfvz ginkgo_issue732_cascadelake_jacobi_kernels.tar.gz
$ export LD_LIBRARY_PATH=$PWD/ginkgo_issue732_cascadelake_jacobi_kernels
$ ginkgo_issue732_cascadelake_jacobi_kernels/jacobi_kernels
[==========] Running 39 tests from 1 test case.
...
[ RUN ] Jacobi.OmpPreconditionerEquivalentToRefWithMPW
Segmentation fault
@boegel Thanks, I am able to reproduce the issue now. Would it be possible to send me Debug binaries without -march=native
? It looks like something is randomly breaking internal data structure, and march=native
doesn't play nice with older valgrind
.
Are your gcc and glibc built from source or packaged? Any special configuration options?
GCC 10.2 is built from source, and not exactly a standard build. The full build procedure is defined by the Python "script" in https://github.com/easybuilders/easybuild-easyblocks/blob/develop/easybuild/easyblocks/g/gcc.py, it involves also building with ISL, ClooG + OpenMP offload support.
Oh, and could you maybe check if the same issue pops up with our
develop
branch? I somehow suspect this issue might still be present in our current code base.
Yes, will do.
Would it be possible to send me Debug binaries without -march=native?
Yes, no problem, I'll look into that.
Oh, in that case, maybe it is easier if I just use Easybuild to compile gcc and see if I run into the same issue.
@upsj That should be... easy, yes. ;)
You should be able to reproduce the problem using this easyconfig file:
easyblock = 'CMakeNinja'
name = 'Ginkgo'
version = '1.3.0'
homepage = 'https://ginkgo-project.github.io'
description = """Ginkgo is a high-performance linear algebra library for manycore systems, with a focus on
sparse solution of linear systems."""
toolchain = {'name': 'GCC', 'version': '10.2.0'}
toolchainopts = {'debug': True, 'noopt': True}
source_urls = ['https://github.com/ginkgo-project/ginkgo/archive/refs/tags']
sources = ['v%(version)s.tar.gz']
checksums = ['1b0e907b4046cdf7cef16d1730c12ba812b38f2764f49f74f454239a27f63596']
builddependencies = [
('CMake', '3.18.4'),
('Ninja', '1.10.1'),
]
buildopts = " && ninja test"
sanity_check_paths = {
'files': [],
'dirs': [],
}
moduleclass = 'lib'
The sanity_check_paths
is empty, which is not correct, but since the tests fail it'll never get to the point where it'll complain about it being empty.
Short instructions:
# install EasyBuild (feel free to adjust as needed with `--user`, `--prefix` or installing in a virtualenv)
pip3 install easybuild
# install Ginkgo and all dependencies (incl. GCC 10.2.0)
eb Ginkgo.eb --robot
(where Ginkgo.eb
is a local easyconfig file with the contents shown above)
If you need help, you know where to find me (EasyBuild Slack).
The tarball with the debug binary/libraries (built with -g -O0
) is too big for GitHub, so I've uploaded it here: https://users.ugent.be/~kehoste/ginkgo_issue732_cascadelake_jacobi_kernels_debug.tar.gz .
Same issue with current develop
branch (commit 94e2361), and actually a couple more failing tests:
The following tests FAILED:
63 - omp/test/matrix/csr_kernels (SEGFAULT)
70 - omp/test/preconditioner/jacobi_kernels (SEGFAULT)
72 - omp/test/reorder/rcm_kernels (SEGFAULT)
102 - core/test/base/executor (Failed)
I am zeroing in closer and closer on this bug, but it seems to be something really fundamental and weird: The reference counters in std::shared_ptr
become zero too early, leading to a use-after-free in OmpExecutor or std::shared_ptr's _M_refcount
.
A minimal reproducer is
int main()
{
auto omp = gko::OmpExecutor::create();
gko::kernels::omp::jacobi::generate(omp);
}
omp/preconditioner/jacobi_kernels.cpp
void generate(std::shared_ptr<const OmpExecutor> exec)
{
#pragma omp parallel for
for (size_type g = 0; g < 100; g++) {
auto other = exec;
}
}
Note that this behavior only occurs with OMP_NUM_THREADS
larger than 1 and using GCC 10.2 built from source with EasyBuild.
I will try to reduce the input some more, but it is starting to sound more and more like a compiler bug.
FWIW: The problem is still there in GCC 10.2.1 (first release candidate for GCC 10.3), so if you can confirm this as a compiler bug, it may be worth opening a bug report to GCC?
Another small bit of info: I've built GCC 10.2.0 without OpenMP offload support (withnvptx = False
), and the segfault problem stays...
Just to have everything documented here, I have reduced the same issue down to a reproducer that is independent of Ginkgo: testlib.cpp:
#include <memory>
void foo(std::shared_ptr<int> f)
{
#pragma omp parallel for
for (size_t g = 0; g < 100; g++) {
auto other = f;
}
}
tester.cpp
#include <memory>
void foo(std::shared_ptr<int> f);
int main() {
foo(std::make_shared<int>(4));
}
Compilation
g++ -g -o tester.cpp.o -c tester.cpp
g++ -g -fPIC -fopenmp -o testlib.cpp.o -c testlib.cpp
g++ -fPIC -g -shared -o libtestlibd.so testlib.cpp.o -lgomp -lpthread
g++ -g tester.cpp.o -o tester -Wl,-rpath,`pwd` libtestlibd.so
The execution spuriously crashes with various memory-related errors (pure virtual function called, corrupted double-linked list, Segfault, ...) on GCC 10.2 built using easybuild
I'm seeing failing tests when building Ginkgo 1.3.0 with GCC 10.2.0 using EasyBuild, on various systems.
I've briefly discussed this with @upsj, who asked me to open an issue with all details included.
Failing test, when building with CMake 3.18.4 +
make
on top of CUDA 11.1.1, then running the tests withmake test
:-O2 -ftree-vectorize -march=native -fno-math-errno
(but I'm seeing the same failing test when trimming this down)A very similar problem happens on another system (RHEL 8.2, AMD Rome, AMD EPYC 7552), without CUDA, when using CMake 3.18.4 + Ninja 1.10.1, same compiler options:
Some hopefully useful info collected with GDB: