Open mitsuhiko opened 3 years ago
As you mentioned, the standard display format uses module offsets rather than code offsets. I'm probably missing something obvious, but why do Sentry and other tools use code offsets instead?
My experience (mostly with LLVM-ecosystem tools, Binaryen, Wabt, and DWARF) has also been that code section offsets are more generally useful than module offsets; one reason is that most of those use cases work (or at least start) with object files, and object files always use section offsets which are relocatable anyway (this is also true even for ELF/MachO, where relocatable section offsets are used instead of code addresses).
But also IIRC on wasm, DWARF uses section offsets even for linked binaries (whereas ELF uses virtual addresses). My guess is that this is because this makes linking much simpler (since the linker doesn't need to worry about how large the other known sections are when it does the code section layout and relocation); perhaps @sbc100 remembers? This independence from other sections also could make post-link processing by other tools easier. (ELF doesn't have this problem because the virtual address space is independent of the binary).
I'm not sure I quite understand the problem you mention with external debug info though. Are you referring to "split" debug info (i.e. -gsplit-dwarf
where the debug info is split into N pieces (where N is the number of object files) and not linked at all? Or do you just mean emscripten's -gseparate-dwarf
flag where the debug sections are stripped out and replaced with an external_debug_info
section? I'm guessing the latter. In that case though, IIUC the code section offset in the final wasm binary should be the same as it was before the debug info was stripped out. I guess I assumed that if you are keeping debug info, you'd also want to keep all of the original sections anyway?
Having said that though, I definitely agree that this mismatch is annoying and that we can improve things.
Your suggestion 1 would be pretty easy. Currently (due to a limitation in LLVM's strip/objcopy functionality) the original code section actually remains in the .debug.wasm file rather than being stripped out when using -gseparate-dwarf
. I intend to eventually fix that, and in that case it would make sense and be straightforward to replace the code with some metadata including the original code offset. (As an aside, I imagined we'd also strip out all the other known sections, such as exports. It's plausible that a debugger would want that info too; as I said, I had imagined that anyone archiving debug info would also want to archive the rest of the binary too).
Suggestion 2 would be harder since it could be a breaking change for tools that parse the stack trace output. But maybe appending to it could work.
Suggestion 3 could also work. Currently the Module object has an access for the custom section data in addition to the imports and exports; we could presumably add a way to get the code section offset, or perhaps even some more metadata about the binary or the sections.
and finally; yes it is a bit unfortunate that arraybuffer-based instantiation is basically equivalent to eval
and lacks a good way to identify module. In principle, the wasm binary could have a name section (with just a module name, if size is an issue) that should be added to the function name (even to the "generic" function name). e.g. modulename.functionname
or modulename.wasm-function[1]
. I'm not sure if that happens in practice; if not, we should fix it.
But I also agree that a build ID has uses too; it probably makes sense to push more on that proposal as well.
So since browsers report the absolute offset within a WASM file and do not have access to the code section offset, debugging tools that only operate on such offsets (such as Sentry or other crash reporting services) need to calculate the original offset. Right now we do this by forcing when split DWARF information is used (we use the proposed
build_id
#133 section to match the files, but the same issue arises whenexternal_debug_info
is used) to retain all original sections in the separate debug file including theCode
section at the right offset.There are three potential options here I see:
original_code_offset
section with the offset of the original code section and embed that in split debug info files.Aside: In general I think there are some option questions about how this is supposed to work in practice. We're also running into issues in matching stack traces to the correct wasm files because of the limitations of the stack trace format. If one uses
WebAssembly.instanciate
with a buffer instead ofWebAssembly.instanciateStreaming
to load web assembly the stack trace format in browsers is completely inadequate to figure out which web assembly module a frame belongs to. As an alternative (ifbuild_id
s become an accepted format) it might be preferrable to add build ids and relative to build id offsets into the stack trace. Eghttp://localhost:8088/lib1.wasm:wasm-function[1]:0x86 @ 483a64fa956ad1c848328c52f15dcc0bce1ca232+0x2)
or something similar).