bitcoin / bitcoin

Bitcoin Core integration/staging tree
https://bitcoincore.org/en/download
MIT License
79.19k stars 36.31k forks source link

TSAN/MSAN fails with vm.mmap_rnd_bits=32 even with llvm 18.1.3 #30674

Open Sjors opened 2 months ago

Sjors commented 2 months ago

The Cirrus CI on my fork of the repo runs on Ubuntu 24.04 with kernel version 6.8.0-38. This has vm.mmap_rnd_bits=32 set, which causes the TSAN and MSAN jobs to fail.

See:

TSAN: https://cirrus-ci.com/task/6619444124844032

FAIL: minisketch/test
=====================
ThreadSanitizer: CHECK failed: tsan_platform_linux.cpp:282 "((personality(old_personality | ADDR_NO_RANDOMIZE))) != ((-1))" (0xffffffffffffffff, 0xffffffffffffffff) (tid=42931)
FAIL minisketch/test (exit status: 139)
FAIL: univalue/test/object
==========================
ThreadSanitizer: CHECK failed: tsan_platform_linux.cpp:282 "((personality(old_personality | ADDR_NO_RANDOMIZE))) != ((-1))" (0xffffffffffffffff, 0xffffffffffffffff) (tid=42964)
FAIL univalue/test/object (exit status: 139)
FAIL: qt/test/test_bitcoin-qt
=============================
ThreadSanitizer: CHECK failed: tsan_platform_linux.cpp:282 "((personality(old_personality | ADDR_NO_RANDOMIZE))) != ((-1))" (0xffffffffffffffff, 0xffffffffffffffff) (tid=42994)
FAIL qt/test/test_bitcoin-qt (exit status: 139)

MSAN: https://cirrus-ci.com/task/4578750543691776

unning tests: base58_tests from test/base58_tests.cpp
Running tests: base64_tests from test/base64_tests.cpp
MemorySanitizer: CHECK failed: msan_linux.cpp:192 "((personality(old_personality | ADDR_NO_RANDOMIZE))) != ((-1))" (0xffffffffffffffff, 0xffffffffffffffff) (tid=22112)
    <empty stack>
make[3]: *** [Makefile:22563: test/base32_tests.cpp.test] Error 1
make[3]: *** Waiting for unfinished jobs....
MemorySanitizer: CHECK failed: msan_linux.cpp:192 "((personality(old_personality | ADDR_NO_RANDOMIZE))) != ((-1))" (0xffffffffffffffff, 0xffffffffffffffff) (tid=22137)
    <empty stack>

This job was from mid July. Just in case I reproduced it against todays master: https://github.com/Sjors/bitcoin/pull/57 / https://cirrus-ci.com/task/4886869396160512

My (limited) understanding is that the underlying issue should have been fixed and the fix has been backported to llvm 18.1.3: https://github.com/google/sanitizers/issues/1614#issuecomment-2010316781

Ubuntu 24.04 has shipped that version since early July:https://launchpad.net/ubuntu/noble/amd64/clang-18

I can see in the CI log this this version was indeed used:

Get:123 http://archive.ubuntu.com/ubuntu noble-updates/main amd64 libllvm18 amd64 1:18.1.3-1ubuntu1 [27.5 MB]

Although I can trivially work around the issue by setting vm.mmap_rnd_bits=28, perhaps there is a deeper issue worth investigating.

Possibly related: https://github.com/ClickHouse/ClickHouse/issues/64086 (they also tried 18.1.3 and 18.1.6).

maflcko commented 2 months ago

You re-ran the same task on the same commit on the same machine 3 hours later and it passed: https://cirrus-ci.com/task/6619444124844032?logs=ci#L313 vs https://cirrus-ci.com/task/5557228785106944?logs=ci#L311

Did you change anything in between?

maflcko commented 2 months ago

Also, probably unrelated, but if you want, you can test https://github.com/bitcoin/bitcoin/pull/30639 and https://github.com/bitcoin/bitcoin/pull/30634

Sjors commented 2 months ago

@maflcko yes, I first reproduced the issue and then tested the workaround vm.mmap_rnd_bits=28. See https://github.com/Sjors/bitcoin/pull/51.

I'll try those clang-19 PRs. If that fixes the issue then presumably the issue is in llvm and they should consider backporting additional commits. But if it doesn't then maybe the problem is on our side (even though it's trivial to work around).

maflcko commented 2 months ago

I see. So in theory it should be reproducible by setting up a vanilla Ubuntu 24.04 (or later) host to run the CI tasks. I guess no one has done so yet, given that you are the first one to observe the issue. However, if it is reproducible, then it probably should be fixed.

Sjors commented 2 months ago

@maflcko clang 19 fixes neither, see https://github.com/Sjors/bitcoin/pull/59.

maflcko commented 2 weeks ago

https://github.com/llvm/llvm-project/commit/7d039effc4930be9240446a4241d268a39960e0b only added two bits 28->30, so a failure with 32 is still expected, unless I am missing something.