danleh / wasabi

A dynamic analysis framework for WebAssembly programs.
http://wasabi.software-lab.org
MIT License
363 stars 47 forks source link

WASI module instrumentation #23

Open vshymanskyy opened 4 years ago

vshymanskyy commented 4 years ago

It would be really great if this tool could help with debugging WASI. I see 2 challenges here:

  1. WASI runtimes typically do not have any JS engine at all. This makes the generated js file useless.
  2. WASI aims to define a stable system interface, so adding additional imported functions is impossible.

I think that it still could be done. By using the WASI API wasabi could output the trace to a log file. Then the generated .js file could be transformed into a log file parser/decoder.

vshymanskyy commented 4 years ago

This could be simplified with implementation on #14 (if taken into consideration)

danleh commented 4 years ago

Interesting idea! So just to clarify the overall goal: You would like Wasabi to work with host environments that are not using JavaScript (browsers or Node.JS), but instead WASI (an interface that essentially defines a bunch of imported functions (like "syscalls") for file operations etc.).

Note that a dynamic analysis in Wasabi is currently written in JavaScript, so the question is what it should be instead. You linked to #14, because that issue proposes to write the analysis in any language that compiles to WebAssembly and then "merge" it into the analyzed program. (I deliberately not say "linking", because the memory of the original program should be completely independent from the memory of the analysis. As far as I understand, linking, e.g., with LLD, would not work right now, because it allows only a single linear memory)

Regarding the two challenges you post:

  1. Yes, if we change the assumption from "host environment is JS" to "host environment is WASI", JavaScript is out of the question. Then, the analysis must be compiled to WebAssembly and use only WASI functions. Open problems/steps (see also issue #14):

    • [ ] How do we handle multiple memories (one in the program, one in the analysis)? How does, e.g., lld handle linking two modules, which both define a memory?
    • [ ] I would still want to support analyzing Wasm programs in the browser/Node.JS. One option (to reduce maintenance overhead of supporting two "Wasasbi backends") is to add a WASI-browser polyfull and use WASI exclusively (and all analyses then have to be written that way)
  2. True, but solvable: analysis and program have to be statically "merged". Then, what is currently a hook import becomes just a function call internal to the merged module. Any WASI function that is used by the analysis, but not by the program can be added. Other than WASI functions, the analysis will not import anything else.

danleh commented 4 years ago

With the technicalities sketched, let's zoom out a little.

  1. Comment: this will be a larger implementation effort (compiling analyses e.g. from Rust to Wasm+WASI, testing that, how to merge modules, adding browser WASI polyfull). Unfortunately, I cannot work on that in the next month.

  2. Question to @vshymanskyy: How did you plan to use Wasabi for debugging WASI (as initially stated)? Do you want to get just a trace of all instructions and their inputs?

  3. Quick hack: If you only want to trace instructions in a WASI module, you could potentially hack together a prototype in https://github.com/danleh/wasabi/blob/master/src/instrument/add_hooks/mod.rs and manually instrument each instruction to call __wasi_fd_write() with the appropriate arguments. But honestly, I think it is going to be a lot of code (>1kLOC) and not very reusable.

vshymanskyy commented 4 years ago

@danleh Yes I think you got the idea correctly. In my particular case, I have a WASI app, that runs fine on some other engines, but fails (at some point) in Wasm3. I could instrument it with wasabi, compare traces, etc. It's true there still many open questions here and we can discuss pros/cons of several ideas. This is really interesting.

FYI, Wasi-browser polyfills are already available, see https://github.com/wasmerio/wasmer-js/tree/master/packages/wasi and https://webassembly.sh/

My initial idea was to dump trace outputs in some format to a file (using existing WASI api), and run the Analyzer offline. This would probably eliminate "multiple memories handling" problem.

vshymanskyy commented 4 years ago

Also, even if wasabi requires a small set of stable functions (i.e. with known signatures), I think we could implement that in Wasm3 and Wasmer, for example.

vshymanskyy commented 4 years ago

wasm-opt also has some instrumentation capabilities. Please see https://github.com/wasmerio/wasmer/issues/1210

vshymanskyy commented 4 years ago

@danleh Here's a very simple example of how WASI modules can be instrumented and run in Node.js: https://github.com/wasm3/wasm-trace It should be quite easy to reuse this script for Wasabi, as it still relies on JS for handling/analyzing the traces.