arximboldi / immer

Postmodern immutable and persistent data structures for C++ — value semantics at scale
https://sinusoid.es/immer
Boost Software License 1.0
2.48k stars 177 forks source link

NDEBUG flag breaks binary compatibility, which can cause mysterious crashes #272

Open kosak opened 11 months ago

kosak commented 11 months ago

Code in immer/config.hpp causes IMMER_TAGGED_NODE to be set to 0 or 1 according to whether NDEBUG is defined. This in turn affects the definition of immer::detail::rbts::node; in particular whether or not impl_data_t contains a kind_t kind field.

This can cause problems if user code compiled with NDEBUG set is linked to other code compiled without NDEBUG, because the Immer data structures are not binary compatible between these two versions.

This situation can arise in a complicated build system where libraries are built by different build steps. Even though ideally one should try to have a coherent set of flags in all cases, I think people expect NDEBUG to be harmless, at most controlling whether assertions are enabled or perhaps turning on some log messages. In my opinion it would be surprising to learn that NDEBUG would break binary compatibility. In my company's case we had Rcpp building some of our code and it used an NDEBUG flag in its build steps.

In the example below, the code in main.cpp doesn't even call the code in util.cpp, but util.cpp poisons the binary, causing main.cpp to crash. Steps to reproduce:

// main.cpp
#include "immer/vector.hpp"

int main() {
  immer::vector<int> v0;
  for (size_t i = 0; i < 100; ++i) {
    v0 = v0.push_back(13);
  }
}
// util.cpp
#include "immer/vector.hpp"

void notcalled(const immer::vector<int> &v0) {
  (void)v0[0];
}

Command:

g++ -c -I. -DNDEBUG main.cpp
g++ -c -I. util.cpp
g++ -o doit util.o main.o 
./doit
doit: ./immer/detail/rbts/node.hpp:175: immer::detail::rbts::node<T, MemoryPolicy, B, BL>::node_t** immer::detail::rbts::node<T, MemoryPolicy, B, BL>::inner() [with T = int; MemoryPolicy = immer::memory_policy<immer::free_list_heap_policy<immer::cpp_heap>, immer::refcount_policy, immer::spinlock_policy>; unsigned int B = 5; unsigned int BL = 6; immer::detail::rbts::node<T, MemoryPolicy, B, BL>::node_t = immer::detail::rbts::node<int, immer::memory_policy<immer::free_list_heap_policy<immer::cpp_heap>, immer::refcount_policy, immer::spinlock_policy>, 5, 6>]: Assertion `kind() == kind_t::inner' failed.
Aborted (core dumped)

This can be a tricky thing to diagnose and debug. In my opinion, Immer ought to do one of two things. Either:

  1. NDEBUG should not control IMMER_TAGGED_NODE (nor IMMER_ENABLE_DEBUG_SIZE_HEAP); if the programmer wants these behaviors they should have to set those flags explicitly and not get them via NDEBUG.

or:

  1. These flags should change the namespace or classnames of all affected code (i.e. both the node data structure and any code referencing the node data structure). This would have the nice property that the two versions could coexist side-by-side and in the worst case an inconsistency would lead to a link error rather than a mysterious crash.

If you agree, I would be happy to contribute code for item 1 or 2, once I know what your preference is.

arximboldi commented 11 months ago

Hi @kosak!

Good find. I think you're right. I'm happy with option 1. When you make the PR, maybe you can add a CMake flag (-DImmer_DEBUG_TOOLS) or alike to enable those easily during development? Also remember to enable it in test.yml so in Debug builds that in CI it is still enabled.

jeaye commented 1 month ago

200

Two full days I've spent narrowing down a test case in jank, thinking this was due to UB in Clang 19's JIT compilation. After ruling everything else out, I could only reproduce the issue with immer containers. I didn't spot the issue myself, and I wasn't close to doing so. Fortunately I did take a look at the open issues, though.

@kosak, you're a saint. Thanks for finding this and making the issue. For me, the way it showed up is that jank is compiled with optimizations for release builds, but it doesn't enable optimizations by default for JIT compiled code (those functions can be later optimized on demand).

I can work around this for now by just defining IMMER_TAGGED_NODE myself, both in AOT and JIT compiled code. But I agree that having ABI differences dependent on NDEBUG is gobbling up sanity for other people around the world right now too.

kosak commented 1 month ago

Thank you for the kind words. I'm so glad it helped you! Regarding "saint", recently I've been feeling pretty guilty for not following up with a fix. I could finally put something together or if someone else wants to do that, that's fine too.

jeaye commented 1 month ago

The fix is all yours, if you're up for it, my dude. :bow:

arximboldi commented 1 month ago

@kosak a fix would be very welcome!