Open nicktrav opened 1 year ago
So far my initial attempt to brute force it via mkrelease amd64-linux-gnu
has run into a wall.
CC
(resp. CXX
) must be either gcc
or clang
; the cross-compiler contains $TARGET_TRIPLE
which results in an error [1]
CC
(and CXX
)first "successful" build core dumped
#0 0x0000000000000000 in ?? ()
#1 0x0000000007a50892 in je_malloc_mutex_lock () at /go/src/github.com/cockroachdb/cockroach/c-deps/jemalloc/include/jemalloc/internal/mutex.h:101
#2 malloc_init_hard () at /go/src/github.com/cockroachdb/cockroach/c-deps/jemalloc/src/jemalloc.c:1486
#3 0x0000000007a5681b in malloc_init () at /go/src/github.com/cockroachdb/cockroach/c-deps/jemalloc/src/jemalloc.c:317
#4 ialloc_body () at /go/src/github.com/cockroachdb/cockroach/c-deps/jemalloc/src/jemalloc.c:1583
#5 calloc () at /go/src/github.com/cockroachdb/cockroach/c-deps/jemalloc/src/jemalloc.c:1824
#6 0x00007f7220cecc05 in _dlerror_run (operate=operate@entry=0x7f7220cec490 <dlsym_doit>, args=args@entry=0x7ffe000e8730) at dlerror.c:148
#7 0x00007f7220cec525 in __dlsym (handle=<optimized out>, name=0x7c03f7f "mmap") at dlsym.c:70
#8 0x00000000004ed434 in __interception::InterceptFunction(char const*, unsigned long*, unsigned long, unsigned long) ()
#9 0x00000000004cec1e in InitializeCommonInterceptors() ()
#10 0x00000000004ce46a in __asan::InitializeAsanInterceptors() ()
#11 0x00000000004e8c5e in __asan::AsanInitInternal() ()
#12 0x00007f7220e8acf6 in _dl_init (main_map=0x7f7220ea8190, argc=2, argv=0x7ffe000e8868, env=0x7ffe000e8880) at dl-init.c:104
#13 0x00007f7220e7a13a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#14 0x0000000000000002 in ?? ()
#15 0x00007ffe000e940f in ?? ()
#16 0x00007ffe000e9432 in ?? ()
#17 0x0000000000000000 in ?? ()
jemalloc
by setting stdmalloc
TAGsecond "successful" build aborts with global-buffer-overflow
./cockroach-linux-2.6.32-gnu-amd64 version
=================================================================
==3684821==ERROR: AddressSanitizer: global-buffer-overflow on address 0x00000fb8f7b0 at pc 0x000000df9569 bp 0x000000000000 sp 0x10c0001b7408
READ of size 16 at 0x00000fb8f7b0 thread T0
#0 0xdf9568 in github.com/cockroachdb/cockroach/pkg/build.BinaryVersionPrefix /go/src/github.com/cockroachdb/cockroach/pkg/build/info.go:71
0x00000fb8f7b0 is located 16 bytes to the left of global variable 'github.com/cockroachdb/cockroach/pkg/build.typ' defined in '/go/src/github.com/cockroachdb/cockroach/pkg/build/info.go:39:2' (0xfb8f7c0) of size 16
0x00000fb8f7b0 is located 0 bytes to the right of global variable 'github.com/cockroachdb/cockroach/pkg/build.rev' defined in '/go/src/github.com/cockroachdb/cockroach/pkg/build/info.go:33:2' (0xfb8f7a0) of size 16
0x00000fb8f7b0 is located 32 bytes to the left of global variable 'github.com/cockroachdb/cockroach/pkg/build.utcTime' defined in '/go/src/github.com/cockroachdb/cockroach/pkg/build/info.go:32:2' (0xfb8f7d0) of size 16
0x00000fb8f7b0 is located 0 bytes inside of global variable 'github.com/cockroachdb/cockroach/pkg/build.tag' defined in '/go/src/github.com/cockroachdb/cockroach/pkg/build/info.go:31:2' (0xfb8f7b0) of size 16
SUMMARY: AddressSanitizer: global-buffer-overflow /go/src/github.com/cockroachdb/cockroach/pkg/build/info.go:71 in github.com/cockroachdb/cockroach/pkg/build.BinaryVersionPrefix
gcc
instead of clang
(default in the builder image)Not sure why overflow tracking of globals has to be disabled [1], but it does appear to work without it,
ASAN_OPTIONS=report_globals=0 ./cockroach-linux-2.6.32-gnu-amd64 version
Build Tag: v22.2.3-dirty
Build Time: 2023/02/17 04:42:08
Distribution: CCL
Platform: linux amd64 (x86_64-unknown-linux-gnu)
Go Version: go1.19.1
C Compiler: gcc 9.4.0
Build Commit ID: 81a114c2bc2ef8ef76fe3809d0469319c7e82635
Build Type: release
My primitive efforts to cause memory corruption via strace
,
strace -f -e inject=mmap:retval=232849384973 -p $pid
confirm that ASan is running, although in this case it offers less insight than the runtime :)
I230217 06:27:56.124414 1 util/log/file_sync_buffer.go:238 ⋮ [config] file created at: 2023/02/17 06:27:56
I230217 06:27:56.124448 1 util/log/file_sync_buffer.go:238 ⋮ [config] running on machine: ‹gceworker-srosenberg-2›
I230217 06:27:56.124472 1 util/log/file_sync_buffer.go:238 ⋮ [config] binary: CockroachDB CCL v22.2.3-dirty (x86_64-unknown-linux-gnu, built 2023/02/17 04:42:08, go1.19.1)
I230217 06:27:56.124495 1 util/log/file_sync_buffer.go:238 ⋮ [config] arguments: [‹./cockroach-linux-2.6.32-gnu-amd64› ‹start-single-node› ‹--insecure›]
I230217 06:27:56.124528 1 util/log/file_sync_buffer.go:238 ⋮ [config] log format (utf8=✓): crdb-v2
I230217 06:27:56.124543 1 util/log/file_sync_buffer.go:238 ⋮ [config] line format: [IWEF]yymmdd hh:mm:ss.uuuuuu goid [chan@]file:line redactionmark \[tags\] [counter] msg
I230217 06:27:56.124179 1 util/log/flags.go:211 [-] 1 stderr capture started
==4042800==AddressSanitizer CHECK failed: ../../../../src/libsanitizer/sanitizer_common/sanitizer_allocator_primary64.h:636 "((beg)) == ((mapped))" (0x623000030000, 0x3636e7a60d)
AddressSanitizer:DEADLYSIGNAL
=================================================================
runtime: mmap(0x10c005c00000, 4194304) returned 0x3636e7a60d, 0
fatal error: runtime: cannot map pages in arena address space
runtime stack:
runtime.throw({0x8637c22?, 0x2430?})
/usr/local/go/src/runtime/panic.go:1047 +0x5d fp=0x7fe319a44858 sp=0x7fe319a44828 pc=0x48dadd
runtime.sysMapOS(0x10c005c00000, 0x400000?)
/usr/local/go/src/runtime/mem_linux.go:191 +0x10a fp=0x7fe319a448a0 sp=0x7fe319a44858 pc=0x46c38a
Note, the following technique is actually fairly effective.
ASAN_OPTIONS=report_globals=0 ./cockroach-linux-2.6.32-gnu-amd64 start-single-node --insecure
pmap `pgrep cockroach` |grep rw
E.g., we could pick anon
which is likely to be zeroed (i.e., high probability of nil-dereference) or a page that belongs to libnss_files-2.31
.
00007f04894fa000 8448K rw--- [ anon ]
00007f0489d47000 4K rw--- libnss_files-2.31.so
It will result in calls to mmap
.
./cockroach-linux-2.6.32-gnu-amd64 workload init tpcc
Suppresses every 1 + 10*k call to mmap
and instead returns a memory address which we picked above.
strace -f -e inject=mmap:retval=139657314369536:when=1+10 -p $pid
The above will trigger runtime failures as well as Asan failures. E.g.,
==44210==AddressSanitizer CHECK failed: ../../../../src/libsanitizer/sanitizer_common/sanitizer_allocator_primary64.h:636 "((beg)) == ((mapped))" (0x62d0019a0000, 0x7f68426d5008)
==44210==ERROR: AddressSanitizer failed to deallocate 0x1000 (4096) bytes at address 0x7f68426d5008
==44210==AddressSanitizer CHECK failed: ../../../../src/libsanitizer/sanitizer_common/sanitizer_posix.cc:60 "(("unable to unmap" && 0)) != (0)" (0x0, 0x0)
[1] https://github.com/google/sanitizers/wiki/AddressSanitizerFlags
I did manage to build a docker container with the ASan instrumented binary. However, I took several shortcuts,
added -static-libasan
(to LDFLAGS
)
libasan
is not statically linkedused ubi9/ubi-minimal
(instead of ubi8/ubi-minimal
)
Recall, I wasn't cross-compiling because -asan
is currently not compatible with our cross-ng config.
added ENV ASAN_OPTIONS=report_globals=0
to Dockerfile
Is your feature request related to a problem? Please describe.
Memory and address sanitization is useful for identifying subtle issues that manifest only at runtime. The Go runtime supports both via the
-msan
and-asan
build flags.Describe the solution you'd like
Consider adding support for building Cockroach binaries with these features enabled. This will require some changes to the build system in order to support.
Given there is a performance penalty for running with these runtime features enabled, production binaries should not be built with
-msan
/-asan
. Instead, we could consider offering a debug binary and container image. Short of this, we already have the ability to produce one-off builds via TeamCity. The existing pipelines should have some way of enabling ASAN / MSAN in the resultant artifacts.Additional context
More context can be found in this issue (internal) that possibly could have benefited from having address and memory sanitization, in order to narrow down the cause of some memory corruption encountered in the wild.
Jira issue: CRDB-24594