abseil / abseil-cpp

Abseil Common Libraries (C++)
https://abseil.io
Apache License 2.0
14.7k stars 2.57k forks source link

[Bug]: missing Mutex::Dtor on linux? #1624

Open h-vetinari opened 6 months ago

h-vetinari commented 6 months ago

Describe the issue

While https://github.com/conda-forge/grpc-cpp-feedstock/pull/348 grpc with -DgRPC_ABSL_PROVIDER="package" in conda-forge against abseil 20240116.0, the sanity check of compiling an example against the library fails:

[7/18] Linking CXX executable greeter_client
FAILED: greeter_client 
: && $PREFIX/bin/x86_64-conda-linux-gnu-c++ -fvisibility-inlines-hidden -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem $PREFIX/include -fdebug-prefix-map=$SRC_DIR=/usr/local/src/conda/libgrpc-1.61.0 -fdebug-prefix-map=$PREFIX=/usr/local/src/conda-prefix -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,$PREFIX/lib -Wl,-rpath-link,$PREFIX/lib -L$PREFIX/lib CMakeFiles/greeter_client.dir/greeter_client.cc.o -o greeter_client  libhw_grpc_proto.a  $PREFIX/lib/libabsl_flags_parse.so.2401.0.0  $PREFIX/lib/libgrpc++_reflection.so.1.61.0  $PREFIX/lib/libgrpc++.so.1.61.0  $PREFIX/lib/libprotobuf.so.25.2.0  $PREFIX/lib/libgrpc.so.38.0.0  $PREFIX/lib/libaddress_sorting.so.38.0.0  $PREFIX/lib/libupb_json_lib.so.38.0.0  $PREFIX/lib/libupb_textformat_lib.so.38.0.0  $PREFIX/lib/libupb_message_lib.so.38.0.0  $PREFIX/lib/libupb_base_lib.so.38.0.0  $PREFIX/lib/libupb_mem_lib.so.38.0.0  $PREFIX/lib/libutf8_range_lib.so.38.0.0  $PREFIX/lib/libssl.so  $PREFIX/lib/libcrypto.so  $PREFIX/lib/libgpr.so.38.0.0  -ldl  -lm  -lrt  $PREFIX/lib/libabsl_random_distributions.so.2401.0.0  $PREFIX/lib/libabsl_random_seed_sequences.so.2401.0.0  $PREFIX/lib/libabsl_random_internal_pool_urbg.so.2401.0.0  $PREFIX/lib/libabsl_random_internal_randen.so.2401.0.0  $PREFIX/lib/libabsl_random_internal_randen_hwaes.so.2401.0.0  $PREFIX/lib/libabsl_random_internal_randen_hwaes_impl.so.2401.0.0  $PREFIX/lib/libabsl_random_internal_randen_slow.so.2401.0.0  $PREFIX/lib/libabsl_random_internal_platform.so.2401.0.0  $PREFIX/lib/libabsl_random_internal_seed_material.so.2401.0.0  $PREFIX/lib/libabsl_random_seed_gen_exception.so.2401.0.0  $PREFIX/lib/libabsl_log_internal_check_op.so.2401.0.0  $PREFIX/lib/libabsl_leak_check.so.2401.0.0  $PREFIX/lib/libabsl_die_if_null.so.2401.0.0  $PREFIX/lib/libabsl_log_internal_conditions.so.2401.0.0  $PREFIX/lib/libabsl_log_internal_message.so.2401.0.0  $PREFIX/lib/libabsl_log_internal_nullguard.so.2401.0.0  $PREFIX/lib/libabsl_examine_stack.so.2401.0.0  $PREFIX/lib/libabsl_log_internal_format.so.2401.0.0  $PREFIX/lib/libabsl_log_internal_proto.so.2401.0.0  $PREFIX/lib/libabsl_log_internal_log_sink_set.so.2401.0.0  $PREFIX/lib/libabsl_log_sink.so.2401.0.0  $PREFIX/lib/libabsl_log_entry.so.2401.0.0  $PREFIX/lib/libabsl_log_initialize.so.2401.0.0  $PREFIX/lib/libabsl_log_globals.so.2401.0.0  $PREFIX/lib/libabsl_vlog_config_internal.so.2401.0.0  $PREFIX/lib/libabsl_log_internal_fnmatch.so.2401.0.0  $PREFIX/lib/libabsl_log_internal_globals.so.2401.0.0  $PREFIX/lib/libabsl_statusor.so.2401.0.0  $PREFIX/lib/libabsl_status.so.2401.0.0  $PREFIX/lib/libabsl_strerror.so.2401.0.0  $PREFIX/lib/libabsl_flags_usage.so.2401.0.0  $PREFIX/lib/libabsl_flags_usage_internal.so.2401.0.0  $PREFIX/lib/libabsl_flags_internal.so.2401.0.0  $PREFIX/lib/libabsl_flags_marshalling.so.2401.0.0  $PREFIX/lib/libabsl_flags_reflection.so.2401.0.0  $PREFIX/lib/libabsl_flags_config.so.2401.0.0  $PREFIX/lib/libabsl_cord.so.2401.0.0  $PREFIX/lib/libabsl_cordz_info.so.2401.0.0  $PREFIX/lib/libabsl_cord_internal.so.2401.0.0  $PREFIX/lib/libabsl_cordz_functions.so.2401.0.0  $PREFIX/lib/libabsl_cordz_handle.so.2401.0.0  $PREFIX/lib/libabsl_crc_cord_state.so.2401.0.0  $PREFIX/lib/libabsl_crc32c.so.2401.0.0  $PREFIX/lib/libabsl_str_format_internal.so.2401.0.0  $PREFIX/lib/libabsl_crc_internal.so.2401.0.0  $PREFIX/lib/libabsl_crc_cpu_detect.so.2401.0.0  $PREFIX/lib/libabsl_raw_hash_set.so.2401.0.0  $PREFIX/lib/libabsl_hash.so.2401.0.0  $PREFIX/lib/libabsl_bad_variant_access.so.2401.0.0  $PREFIX/lib/libabsl_city.so.2401.0.0  $PREFIX/lib/libabsl_low_level_hash.so.2401.0.0  $PREFIX/lib/libabsl_hashtablez_sampler.so.2401.0.0  $PREFIX/lib/libabsl_exponential_biased.so.2401.0.0  $PREFIX/lib/libabsl_flags_private_handle_accessor.so.2401.0.0  $PREFIX/lib/libabsl_flags_commandlineflag.so.2401.0.0  $PREFIX/lib/libabsl_bad_optional_access.so.2401.0.0  $PREFIX/lib/libabsl_flags_commandlineflag_internal.so.2401.0.0  $PREFIX/lib/libabsl_flags_program_name.so.2401.0.0  $PREFIX/lib/libabsl_synchronization.so.2401.0.0  $PREFIX/lib/libabsl_graphcycles_internal.so.2401.0.0  $PREFIX/lib/libabsl_kernel_timeout_internal.so.2401.0.0  $PREFIX/lib/libabsl_time.so.2401.0.0  $PREFIX/lib/libabsl_civil_time.so.2401.0.0  $PREFIX/lib/libabsl_time_zone.so.2401.0.0  $PREFIX/lib/libabsl_stacktrace.so.2401.0.0  $PREFIX/lib/libabsl_symbolize.so.2401.0.0  $PREFIX/lib/libabsl_strings.so.2401.0.0  $PREFIX/lib/libabsl_strings_internal.so.2401.0.0  $PREFIX/lib/libabsl_string_view.so.2401.0.0  $PREFIX/lib/libabsl_int128.so.2401.0.0  $PREFIX/lib/libabsl_throw_delegate.so.2401.0.0  $PREFIX/lib/libabsl_malloc_internal.so.2401.0.0  $PREFIX/lib/libabsl_debugging_internal.so.2401.0.0  $PREFIX/lib/libabsl_demangle_internal.so.2401.0.0  $PREFIX/lib/libabsl_base.so.2401.0.0  -lpthread  $PREFIX/lib/libabsl_raw_logging_internal.so.2401.0.0  $PREFIX/lib/libabsl_log_severity.so.2401.0.0  $PREFIX/lib/$PREFIX/x86_64-conda-linux-gnu/bin/ld: libhw_grpc_proto.a(helloworld.grpc.pb.cc.o): in function `grpc::CompletionQueue::~CompletionQueue()':
helloworld.grpc.pb.cc:(.text._ZN4grpc15CompletionQueueD2Ev[_ZN4grpc15CompletionQueueD5Ev]+0x4d): undefined reference to `absl::lts_20240116::Mutex::Dtor()'
$PREFIX/x86_64-conda-linux-gnu/bin/ld: libhw_grpc_proto.a(helloworld.grpc.pb.cc.o): in function `grpc::ClientReader<helloworld::HelloReply>::~ClientReader()':
helloworld.grpc.pb.cc:(.text._ZN4grpc12ClientReaderIN10helloworld10HelloReplyEED2Ev[_ZN4grpc12ClientReaderIN10helloworld10HelloReplyEED5Ev]+0x6c): undefined reference to `absl::lts_20240116::Mutex::Dtor()'
$PREFIX/x86_64-conda-linux-gnu/bin/ld: libhw_grpc_proto.a(helloworld.grpc.pb.cc.o): in function `non-virtual thunk to grpc::ClientReader<helloworld::HelloReply>::~ClientReader()':
helloworld.grpc.pb.cc:(.text._ZN4grpc12ClientReaderIN10helloworld10HelloReplyEED2Ev[_ZN4grpc12ClientReaderIN10helloworld10HelloReplyEED5Ev]+0x10c): undefined reference to `absl::lts_20240116::Mutex::Dtor()'
$PREFIX/x86_64-conda-linux-gnu/bin/ld: libhw_grpc_proto.a(helloworld.grpc.pb.cc.o): in function `non-virtual thunk to grpc::ClientReaderWriter<helloworld::HelloRequest, helloworld::HelloReply>::~ClientReaderWriter()':
helloworld.grpc.pb.cc:(.text._ZN4grpc18ClientReaderWriterIN10helloworld12HelloRequestENS1_10HelloReplyEED2Ev[_ZN4grpc18ClientReaderWriterIN10helloworld12HelloRequestENS1_10HelloReplyEED5Ev]+0x7c): undefined reference to `absl::lts_20240116::Mutex::Dtor()'
$PREFIX/x86_64-conda-linux-gnu/bin/ld: libhw_grpc_proto.a(helloworld.grpc.pb.cc.o): in function `grpc::ClientReaderWriter<helloworld::HelloRequest, helloworld::HelloReply>::~ClientReaderWriter()':
helloworld.grpc.pb.cc:(.text._ZN4grpc18ClientReaderWriterIN10helloworld12HelloRequestENS1_10HelloReplyEED2Ev[_ZN4grpc18ClientReaderWriterIN10helloworld12HelloRequestENS1_10HelloReplyEED5Ev]+0x12c): undefined reference to `absl::lts_20240116::Mutex::Dtor()'
$PREFIX/x86_64-conda-linux-gnu/bin/ld: libhw_grpc_proto.a(helloworld.grpc.pb.cc.o):helloworld.grpc.pb.cc:(.text._ZN4grpc18ClientReaderWriterIN10helloworld12HelloRequestENS1_10HelloReplyEED2Ev[_ZN4grpc18ClientReaderWriterIN10helloworld12HelloRequestENS1_10HelloReplyEED5Ev]+0x1dc): more undefined references to `absl::lts_20240116::Mutex::Dtor()' follow
collect2: error: ld returned 1 exit status

Previously I had opened #1614 and https://github.com/grpc/grpc/issues/35854, but it now looks like this might be related to https://github.com/abseil/abseil-cpp/commit/f3760b4d3b2773d1cb8e9ddda29dc9bce386d540. In particular, the following looks like a likely culprit https://github.com/abseil/abseil-cpp/blob/119e0d3f74733aff2999d39cb8be99ddcb081c66/absl/synchronization/mutex.cc#L734-L739

I'm not sure what grpc does that (apparently) invalidates the assumptions explained in the commit message of that change.

Steps to reproduce the problem

Build https://github.com/grpc/grpc with -DgRPC_ABSL_PROVIDER="package" against abseil 20240116.0 and then compile the grpc helloworld example. It would also be possible to replay the recipe we have in conda-forge.

What version of Abseil are you using?

20240116.0

What operating system and version are you using?

Linux

What compiler and version are you using?

GCC 12

What build system are you using?

CMake

Additional context

No response

h-vetinari commented 6 months ago

CC @dvyukov as the author of https://github.com/abseil/abseil-cpp/commit/f3760b4d3b2773d1cb8e9ddda29dc9bce386d540.

Relevant context is also that we're using shared builds of abseil.

h-vetinari commented 6 months ago

I ended up building a version of abseil (published in a separate channel) with the following patch:

diff --git a/absl/synchronization/mutex.cc b/absl/synchronization/mutex.cc
index cb3c7e74..76d561df 100644
--- a/absl/synchronization/mutex.cc
+++ b/absl/synchronization/mutex.cc
@@ -731,12 +731,10 @@ static unsigned TsanFlags(Mutex::MuHow how) {
 }
 #endif

-#if defined(__APPLE__) || defined(ABSL_BUILD_DLL)
 // When building a dll symbol export lists may reference the destructor
 // and want it to be an exported symbol rather than an inline function.
 // Some apple builds also do dynamic library build but don't say it explicitly.
 Mutex::~Mutex() { Dtor(); }
-#endif

 #if !defined(NDEBUG) || defined(ABSL_HAVE_THREAD_SANITIZER)
 void Mutex::Dtor() {
diff --git a/absl/synchronization/mutex.h b/absl/synchronization/mutex.h
index d53a22bb..2cd5a1e4 100644
--- a/absl/synchronization/mutex.h
+++ b/absl/synchronization/mutex.h
@@ -1064,11 +1064,6 @@ inline Mutex::Mutex() : mu_(0) {

 inline constexpr Mutex::Mutex(absl::ConstInitType) : mu_(0) {}

-#if !defined(__APPLE__) && !defined(ABSL_BUILD_DLL)
-ABSL_ATTRIBUTE_ALWAYS_INLINE
-inline Mutex::~Mutex() { Dtor(); }
-#endif
-
 #if defined(NDEBUG) && !defined(ABSL_HAVE_THREAD_SANITIZER)
 // Use default (empty) destructor in release build for performance reasons.
 // We need to mark both Dtor and ~Mutex as always inline for inconsistent

With this, the grpc example compiles again.

h-vetinari commented 6 months ago

Could this simply be a case of the shared library case on linux not being covered correctly?

h-vetinari commented 6 months ago

@derekmauro @dvyukov, could you comment about the (un-)inlining of the Mutex destructor, or alternatively, chime in on the linked grpc issue if their usage of the Mutex is somehow not supported?

dvyukov commented 6 months ago

Do you use bazel or cmake build?

I am not an expert on all open-source absl build modes. Is it possible to add a test for this?

I see this code does something similar: https://github.com/abseil/abseil-cpp/blob/14b8a4eac3e5a7b97ba4cc7b7dadf2a85aae8215/absl/base/internal/thread_identity.h#L249-L252 and it also checks ABSL_CONSUME_DLL. Should we check ABSL_CONSUME_DLL here as well? Though, not sure how this is related to consuming dll's.

I see that this: https://github.com/abseil/abseil-cpp/blob/14b8a4eac3e5a7b97ba4cc7b7dadf2a85aae8215/absl/copts/AbseilConfigureCopts.cmake#L6-L9 exports ABSL_BUILD_DLL only for Windows (MSVC). Is there a macro for BUILD_SHARED_LIBS mode? Perhaps we should check that macro?

h-vetinari commented 6 months ago

Hey @dvyukov, thanks for the response. We're using CMake to build, which correctly sets ABSL_BUILD_DLL on windows (and ABSL_CONSUME_DLL gets correspondingly set when building against shared abseil).

My understanding is that this mechanism originally was intended (only?) for __declspec(dllexport) (resp. dllimport), which is necessary to make shared builds work on windows at all.

Now the problem only appears on Linux, where usually, ABSL_BUILD_DLL should not be a thing, according to my understanding. But if that's the preferred solution, I'm happy to set that symbol also on Linux while building.

It would be functionally equivalent to the patch I posted above, which I already tested successfully.

dvyukov commented 6 months ago

It would be functionally equivalent to the patch I posted above, which I already tested successfully.

If I read your first patch correctly, it effectively undoes the optimization, so it's not good.

I am not expect on absl build modes and macros, especially used in open-source. So I will defer this to absl maintainers. @derekmauro do you know who can say what's the right fix for this?

derekmauro commented 6 months ago

I'll take a look when I can. Please be patient.

h-vetinari commented 6 months ago

It would be functionally equivalent to the patch I posted above, which I already tested successfully.

If I read your first patch correctly, it effectively undoes the optimization, so it's not good.

My intention was not to suggest that the patch should be merged, what I meant was that for our specific scenario (builds against a shared abseil) the removal of the #if had the same effect as defining ABSL_BUILD_DLL.

I'll take a look when I can. Please be patient.

I hope that a first ping after a week is not considered excessive. 😅

It is blocking a lot of work in our distribution though, so I was mainly looking for overall direction, rather than a fix. For now, I don't even know if this is an issue in abseil or in grpc, but I'm assuming that lots of places implicitly use the Mutex, and since all our builds are shared, I expect this to be too much to fix everywhere on the side of abseil-consumers (though perhaps I'm wrong though and it's grpc-specific after all!).

The upshot is that I'm looking towards just uninlining the Dtor (in our distribution) for 20240116.x, which should at least unblock us for now, and we can then reconsider for the summer release. If you think I should hold off on that for ${reason}, please let me know.

derekmauro commented 6 months ago

I haven't tried this, but I ran the build "in my head" (which tends to be unreliable). I think it is likely caused by mixing an installed absl library built with -DNDEBUG with a gRPC library that includes absl headers that is not being built with -DNDEBUG. If this is the case, it is something I've been begging people not to do.

This is one of those build modes I would like to say is not supported, as I regularly have to tell people when they only instrument half of their builds with sanitizers, but the fact of the matter is that with -DNDEBUG, people get away with doing this 99.9% of the time.

One option which I really don't like is to patch this for open-source users, but keep @dvyukov's optimization for internal Google users only. This would be an annoying maintainability problem. I'll have to test it to see how bad it actually looks.

h-vetinari commented 6 months ago

Thank you for the response!

This is an interesting case, because we do set -DNDEBUG for our release builds, but the failure occurred while naïvely building examples/cpp/helloworld from grpc, which doesn't set that symbol.

It's surprising to me that the logic here is https://github.com/abseil/abseil-cpp/blob/2f9e432cce407ce0ae50676696666f33a77d42ac/absl/base/macros.h#L91-L100 rather than, say

#ifdef(DEBUG)
  <debug-case>
#else
  <nodebug-case>
#fi

The latter would not open up that footgun of a vanilla build (no options) having a different ABI than one with -NDEBUG.

If this is the case, it is something I've been begging people not to do.

I understand, though the realities of a distribution are that we have to use pre-built artefacts. If that means that we have to put NDEBUG in the library interface, we can do that

h-vetinari commented 6 months ago

This is an interesting case, because we do set -DNDEBUG for our release builds, but the failure occurred while naïvely building examples/cpp/helloworld from grpc, which doesn't set that symbol.

OK, I misdiagnosed that (probably trying to fit your analysis subconsciously) -- the compiler setup is the same in all cases, so we do set -DNDEBUG even for that grpc example. I've double-checked this in https://github.com/conda-forge/grpc-cpp-feedstock/pull/357.

h-vetinari commented 6 months ago

OK, I misdiagnosed that [...]

Yet another update: while our compiler setup is the same (CPPFLAGS contains -DNDEBUG, CXXFLAGS does not), it seems that abseil and grpc differ in how they pick this up - abseil consequently does build with -DNDEBUG, while grpc (or at least the example) doesn't. Adding -DNDEBUG to the CXXFLAGS for the grpc example does make it work again, validating your analysis.

EddyXorb commented 5 months ago

I have also an issue with the inlined desctructor, but on windows when building static libs of abseil. When I build ortools v9.9 which uses abseil 202401 version of abseil, I get multiply defined symbols for Mutex::Dtor(). Ortools is quite big and it uses abseil, but also Protobuf as dependencies. Up to now I did not test it with the (rollback-)patch, but this seems likely a fix for the issue.

dvyukov commented 5 months ago

Does this build include inconsistent NDEBUG defines (as @derekmauro mentioned here https://github.com/abseil/abseil-cpp/issues/1624#issuecomment-1967655080)?

EddyXorb commented 5 months ago

Not that I am aware of. The ortools cmake machinery is quite complex so I cannot easily verify that, maybe I find the time to dig into that more. Another possibility could be a wrongly set or logically incorrect usage of the flag BUILD_ABSL_DLL.

lo-asys commented 4 months ago

I'd like to add, for anyone coming across similar problems, that not defining the NDEBUG macro in your Release build will cause very cryptic errors at the linker stage; in my case, a dependency on grpc (and therefore protobuf and abseil) managed via vcpkg in an MSbuild project on Windows produced ODR Violation linker errors exactly in Mutex::Dtor. It wasn't until I randomly stumbled across this exact thread after hours of increasingly desperate internet searches that I noticed the missing NDEBUG definition on my command line. Obvious error in retrospect, but tracing that down backwards from the link stage to the preprocessor was pretty nightmarish. It feels like this should be mentioned in documentation somewhere, so other developers don't have to go through that ordeal. Maybe this comment will help someone directly or serve as a starting point for this information to be more visible to library users.

Mizux commented 1 month ago

my 2 cents for @EddyXorb and @dvyukov