Open def- opened 1 year ago
Since I see mz_ore::stack
in the backtrace, I guess we are rolling our own stack?
Something like that! We are using stacker to dynamically grow the stack when working with deeply recursive structures, like query plans. That crate definitely uses unsafe code, but I can't say whether there is actually a use-after-free here or asan just gets confused by the non-standard approach.
PSA for everyone trying to reproduce this: If the ASan stack trace doesn't have symbols, make sure that you have llvm-symbolizer
on your path!
One thing I noticed is that the repro disables debug assertions. Compiling in debug mode with disabled debug assertions can be expected to cause problems because of https://github.com/MaterializeInc/materialize/blob/575af42b535819c4ef958eb6fe7254bfba2a3781/src/ore/src/stack.rs#L22-L61
The gist is that in debug mode (or rather: at a low opt-level) stack frames are much larger than in release mode, so we want to increase the size of the red zone within which we decide to grow the stack. In Rust code you cannot directly check which opt-level the build is using, so we instead use debug assertions as a proxy. So when you disable debug assertions, you make MZ use the smaller red zone size, while still having the huge stack frames.
The thing this can lead to are stack overflows, if a stack frame is larger than the red zone. In the asan output above we have a stack frame of 43 kB, which is larger than the non-debug assertions red zone size of 32 kB, so that checks out. The only things that's not clear to me is whether a stack overflow causes a use-after-return error under asan. Normally, we would expect a stack overflow to segfault when the program tries to access the guard page. But if the guard page is too small, the program might just "hop over" without accessing it. Or, even if the guard page is large enough, maybe asan's instrumentation affects the handling of stack overflows somehow?
Case in point for that this actually is a stack overflow in disguise: Try running the same command but without asan (on main).
$ env RUSTFLAGS="-C debug-assertions=off" bin/sqllogictest -- test/sqllogictest/cast.slt
[...]
thread 'coordinator' has overflowed its stack
fatal runtime error: stack overflow
Thanks for looking into this @teskje . The reason I disabled debug-assertions was https://github.com/MaterializeInc/materialize/issues/17802 Let me see if we can enable debug-assertions again when that is fixed.
What version of Materialize are you using?
5aabd6e8c4cdf81f945d3920a8aced814710d48c
How did you install Materialize?
Built from source
What is the issue?
Using https://github.com/MaterializeInc/materialize/pull/17670
I'm mostly interested if this kind of report is valuable, contains enough information and is worth pursuing further. I'm lacking experience with how Rust interacts with ASan.
Relevant log output