Open upsj opened 3 years ago
@upsj Thanks a lot for getting this "on record".
As far as I can tell, there's nothing to change on the EasyBuild side for this? It's just a weird problem that popped up for (to me at least) a very technical implementation issue?
@Flamefire, @zao: Anything to add? Can we close this now we have it on record?
Just a minor clarification:
since the updates assume that they happen single-threaded, and as such don't use atomics.
--> Users are required to link to pthreads if any dependency uses it.
Just a minor comment on the clarification 😄 I think the ODR violation happens because of a pthread symbol, not the std::shared_ptr constructor/destructor, since during object compilation, there is no knowledge about whether the code will be executed sequentially or in concurrently. There is also this related SO answer, which most likely has the same underlying cause and discusses the differences in linker behavior https://stackoverflow.com/a/47241854/8217180
Yes and no. The relevant __gthread_active_p
called by the constructor/destructor is different due to that being static. So the shared_ptr functions are indeed different and behave differently. So well... Somewhere is an ODR violation ;)
What I found interesting: The answer refers to --as-needed
and ld.gold
. In EB we use ld-gold by default and at least my system GCC uses --no-as-needed
by default, while EB does not.
That would likely be something we could change in EB: Build GCC with --no-as-needed too.
Note that this is not a bug in EasyBuild, but a note for how things can subtly break compared to system GCC
Due to small differences in how EasyBuild builds GCC compared to most packaged system GCC versions, the following program can fail due to race conditions in updates to the std::shared_ptr reference counters since the updates assume that they happen single-threaded, and as such don't use atomics.
testlib.cpp
tester.cpp
compiling with
can cause segfaults and other nice things when running
./tester
with multiple threads.I'll just copy @Flamefire's explanation in here for why this happens:
A workaround/fix for these issues is adding
-fopenmp
,-pthread
or-lpthread
to your linker flags across all levels of your application that use the OpenMP-enabled library.Related failure in real code: https://github.com/ginkgo-project/ginkgo/issues/732