Open aidanhs opened 3 years ago
You are correct about the default memory layout. That was chosen deliberately in order to give static address very small absolute values in order to minimize code size (the i32.const <CONST>
instruction takes fewer bytes when const is small).
You can change the default by passing --stack-first
to the linker (or -Wl,--stack-first
to the compiler).
Ideally, clang should also have an option to emit code which checks for overflow when incrementing the stack. I started some work in this direction in https://reviews.llvm.org/D70515, however I've not had time to pursue it further.
I've been banging my head against the wall with a random crash after random amounts of time for the past few days. Turns out my issue is here!
I'm surprised that it's possible to compile code that ends up having overlapping stack variables. Is there nothing that can be done to automatically detect this? It seems more like a Clang/LLVM issue more than anything, if it should know at compile time that stack will overflow into data section.
As @sbc100 mentioned, you can pass -Wl,--stack-first
to clang.
Beyond that, what's needed is stack overflow checking in LLVM's wasm backend. https://reviews.llvm.org/D70515 is a possible starting point, if anyone wants to work on this. In general it's not possible to detect stack overflow at compile time, because stack allocation happens at runtime, but it is possible to have it check for overflow when adjusting the stack pointer.
@sunfishcode Moving the stack to the front would just cause the crash to happen at the beginning of data rather than the end, wouldn't it?
It means that overflowing the stack isn't likely to clobber global variables without trapping, because most likely it'll wrap around to 4GiB which most often won't be allocated, so accessing the memory there will likely trap.
Another option here is to use Binaryen's StackCheck pass which Emscripten's stack checks are based on. It instruments all assignments to the stack pointer to check for an overflow. Integration with wasi-libc should be simple, basically the pass adds a function __set_stack_limits
which the runtime needs to call with two parameters, the top and bottom of the stack.
We don't need new ad-hoc agreements between runtimes and applications; we just need to improve our tools.
@sunfishcode Oh I see what you mean, gotcha. I'm surprised that stack-first isn't the default, to be honest.
A followup question (or perhaps feedback): how could I have discovered the stack-first
, stack-size
and initial-memory
flags outside of searching issues? https://webassembly.org/ has a number of links to reference-level stuff, as well as some basic tutorials (though they use emscripten). https://wasi.dev/ (and https://github.com/bytecodealliance/wasmtime/blob/main/docs/WASI-documents.md) also have a number of links to reference level documentation. But I'm not sure where my entry point for 'in the trenches' documentation in the WASM world is when I'm trying to build stuff (particularly with wasi-sdk).
As a contrast, coming from emscripten, I can go to https://emscripten.org/ and know that the vast majority of things I'm interested in are either covered there, or linked directly from there - it's very oriented around "things I might want to do" and I've personally found it excellent for that (particular nod at https://emscripten.org/docs/getting_started/FAQ.html and https://emscripten.org/docs/porting/index.html).
A followup question (or perhaps feedback): how could I have discovered the
stack-first
,stack-size
andinitial-memory
flags outside of searching issues? https://webassembly.org/ has a number of links to reference-level stuff, as well as some basic tutorials (though they use emscripten). https://wasi.dev/ (and https://github.com/bytecodealliance/wasmtime/blob/main/docs/WASI-documents.md) also have a number of links to reference level documentation. But I'm not sure where my entry point for 'in the trenches' documentation in the WASM world is when I'm trying to build stuff (particularly with wasi-sdk).
If you google "wasm linker flags" it should take you directly to the wasm-ld linker docs: https://lld.llvm.org/WebAssembly.html Perhaps wask-sdk could link to this page (along with the clang flags) somewhere, but we certainly wouldn't want to duplicate it.
Bear in mind that this is toolchain specific stuff. More specifically its clang/llvm specific toolchain stuff. As such, its not related to core WASI or core WebAssembly.
As a contrast, coming from emscripten, I can go to https://emscripten.org/ and know that the vast majority of things I'm interested in are either covered there, or linked directly from there - it's very oriented around "things I might want to do" and I've personally found it excellent for that (particular nod at https://emscripten.org/docs/getting_started/FAQ.html and https://emscripten.org/docs/porting/index.html).
I agree that it would be good to document more things. I've now submitted https://github.com/bytecodealliance/wasmtime/pull/2703 to document -Wl,--stack-first
.
(this probably doesn't belong here, but I couldn't decide which repo to raise it on)
Steps to reproduce:
./script.sh
You will see output like:
This is identifying that there is an overlap the allocations of
_pythonpath
(calc 3
line) andbuf
(calc 7
line) in https://github.com/aidanhs/wasm-stack-clobber/blob/master/interp-py.c#L79-L123. Both of these are mallocs.For more information, now do
cd wasi-libc && git apply ../wasi-libc.patch
and rebuild wasi-libc. Additionally, editscript.sh
to uncomment-DCUSTOM_LIBC
. Now run./script.sh
again:This printout is executing
dlmalloc_inspect_all
at key points of execution - you can see that dlmalloc reports all the allocations vanish afterwcsncpy
is executed. You can see in the printout that the dlmalloc global structure_gm_
is located within the bounds of the stack-allocatedzip_path
, so ends up being clobbered as part ofwcsncpy
.Though I struggled to find documentation on how stack/heap/data are co-located in a memory space, I found a diagram on https://surma.dev/things/c-to-webassembly/ "The layout that wasm-ld uses is the following:" which seems to indicate the stack can easily overflow into the data section. This feels like it could be the cause of my issue (assuming that there isn't some undefined behaviour in my testcase I haven't spotted that explains it all - entirely possible!).
I can fix this by adding
-Wl,-z,stack-size=$((8*1024*1024)) -Wl,--initial-memory=$((32*1024*1024))
to my final link, but I'm a bit frustrated that I couldn't find any way to figure out what was going on short of ripping apart the Python runtime to try and get a reduced test case. Two questions: