Attach debug info from source language

oovm commented 1 month ago

Feature

Currently, wasmtime's error in wat text files are based on the compilation offset.

When an error occurs, it is very confusing where the error occurred.

I would like to attach some source language location information to point out the error location.

Similar to:

;; @url: file://path/position.wasm:line:column
(type $test-type (externref))
;; @url: file://path/position.wasm:offset
(func $test-function
    ;; @file: main.rs:12:24
    (block $test-block

    )
)

binaryen supports similar features at: https://github.com/WebAssembly/binaryen/blob/921644ca65afbafb84fb82d58dacc4a028e2d720/test/lit/debug/full.wat#L49-L64

Benefit

Clearer error location

Implementation

Unknown, perhaps we can parse custom instructions and mixin them into the final binary wasm with debugger info.

Alternatives

Wait for source language debugging.

But I'm not quite sure how this works, or how to integrate it with my pipeline.

Many times I write wat files manually, without any toolchain.

alexcrichton commented 1 month ago

Thanks for the report! The reason for this is the design of how Wasmtime processes text files today. Wasmtime itself doesn't have much special processing but instead it translates the text to binary very early on in the process and then pretends from then on that a binary was given. That leads to binary offsets for errors even when the input is text, which I agree isn't a great experience.

I'm curious though that there's sort of two features here and which you'd find the most useful:

One feature is the ability to report errors based on the input *.wat syntax, not the binary format. That wouldn't require any comments or annotations or anything like that, but the error message for a validation error for example should point to the erroneous instruction.
A separate feature is the annotations you're listing here. For those that could be split into (a) when printing a wasm binary the annotations are emitted and (b) when parsing a wasm binary the annotations are used for error messages rather than the original position in the *.wat.

Both of these are likely to be somewhat tricky to implement at this time given that there's not a standard currently (beyond "just include all of dwarf") for wasm per se. I think it'd be reasonable to have this though!

oovm commented 1 month ago

For solution 1, I first think about when it is necessary to write wat manually

Learn how to use wat
Communicate or experience new proposed features
Write test cases
Debug optimizer pass or compiler artifacts

Generally speaking, these codes are small in size, and it is not difficult to infer the wrong instructions based on the error message

Decompile highly optimized released wasm and try to find out where the problem is in wat

Marking the wrong instructions is of great help at this time, but I doubt whether this thing itself is meaningful.

Therefore, I think that from the perspective of usage scenarios, maintaining a separate set of wat format error reporting benefits is not obvious.

From the implementation point of view, the parser needs to track the line column of each instruction and pass it layer by layer at runtime, and the changes to the backend are also relatively large.

For solution 2, the biggest problem is that there is no standardization. Other tools may also use special tags independently, just like binaryen does now.

Another hidden problem is whether wasm cg is willing to standardize. Maybe people have no motivation to do so: https://github.com/WebAssembly/debugging/issues/19#issuecomment-1984894109

From the implementation point of view, binary debugging is a determined goal. The change is that the front-end needs to parse some additional custom information, and the runtime and back-end do not need to be changed.

When changes occur or corresponding proposals appear, the cost of changes is relatively small.

In summary, I prefer option 2 when choosing between the two, unless there is a better solution

oovm commented 1 month ago

Another unknown question is what does the wat product of decompiling wasm with DWARF information look like?

Will debugger info be erased directly?

If not, is it appropriate to use the annotations proposal?

If possible, this is very suitable as a bridge for two-way conversion

alexcrichton commented 1 month ago

Currently with wasm-tools print if you've got a wasm with dwarf it'll print (@custom (after data) ".debug.XXXX" "...") which is basically just there to show you it exists, it doesn't actually work. That debuginfo is almost certainly broken once the text goes back to binary (since the dwarf handles binary offsets and binary -> text -> binary likely changes binary offsets, e.g. reencoding lebs into their minimal size)

Otherwise though the annotations proposal is likely the way to go here, but it's still a question of formats. Wasmtime supports DWARF as a means of converting binary offsets to filenames/line numbers but DWARF has no definition in the wasm text format with annotations, and supporting the entirety of dwarf with annotations would be a pretty ambitious project. On the web there's source maps which is not DWARF and Wasmtime doesn't support. I'm not personally familiar with source maps, but the challenge again would be representing all of source maps in the text format which may or may not be feasible.

There's the other alternative of creating a third means of mapping filenames/line numbers to binary offsets. That could be customized just for the wasm text format and have a clear definition with the annotations proposal. The downside of this though is that it's a third alternative and support would need to be added to all consuming/producing tools, which is likely not trivial.

I would hazard a guess that source maps might be the way to go here. That would require investigating if it's reasonable to represent source maps as annotations (which I'm not sure if it is) in the text format and then adding support to Wasmtime/wasm-tools/etc. I'm also not sure who actually produces source maps originally (e.g. tools that do this, I just know that they exist)

fitzgen commented 1 month ago

If you're going to add support for generating debug info to wasm-tools, use DWARF instead of source maps. You don't have to emit all the info that DWARF supports emitting (e.g. full source variable recovery) and can emit just the bits that are equivalent to what you'd put in a source map (the .debug_line section). But DWARF generally has a much higher quality of toolchain support for reading/writing it and is much more compact (binary encoding rather than base64 strings).

alexcrichton commented 1 month ago

Ah yes that's a good point, and sorry yeah I shouldn't say one approach should definitely be used over the other. Adding limited support for annotations<->DWARF to tooling I think could also be reasonable.

oovm commented 1 month ago

If I want to add support for .debug_line in wat where do I start?

I tried tracing from this call

https://github.com/bytecodealliance/wasmtime/blob/f40aaa51a1df835498985acfc91ade33240d02b5/tests/all/traps.rs#L817-L828

I found

https://github.com/bytecodealliance/wasmtime/blob/f40aaa51a1df835498985acfc91ade33240d02b5/crates/environ/src/compile/module_environ.rs#L792-L825

But I don't know how to make the frontend fill in dwarf.debug_line based on the wat @custom section

alexcrichton commented 1 month ago

For my answer below I'm assuming that the scope of the implementation you're interested in is to add the ability to have (@foo ...) annotations in the text format, probably interspersed with actual instructions, which ends up getting the .debug_* sections filled out.

To do that you'll want to take a look at https://github.com/bytecodealliance/wasm-tools, specifically the wast crate located at crates/wast. I'd recommend reading over https://github.com/bytecodealliance/wasm-tools/pull/1394 as well which implements branch hints which is the closest (ish) feature to what you're looking to implement. The various steps involved will be:

Invent a desired syntax, for example (@debug_line "foo.rs" 10 3) or something like that.
Add parsing support to wast to parse these annotations. It'll probably be around expr.rs and you'll store metadata around here.
Update encoding of expressions to return not only branch hint metadata but additionally debug info metadata (this is the location where actual binary offsets are created since it happens during emission)
After functions are encoded you'll check to see if there's any debuginfo present and if so emit the requisite .debug_* custom sections. For this you'll probably want to use the gimli crate to create dwarf sections.

I don't know enough about DWARF myself to be able to suggest enough about what the annotations should look like or what dwarf sections are going to be emitted. You probably won't have to emit every section in a gimli::Dwarf though as much of those aren't related to just filename/line number info I think.

fitzgen commented 1 month ago

FWIW, there is a simple DWARF-writing example here: https://github.com/gimli-rs/gimli/blob/master/crates/examples/src/bin/simple_write.rs#L106

bytecodealliance / wasmtime