ejrgilbert / whamm

Debugging WebAssembly? Put some Whamm! on it.
https://ejrgilbert.github.io/whamm/
16 stars 3 forks source link

Rework compiler structure #3

Closed ejrgilbert closed 5 months ago

ejrgilbert commented 6 months ago

Stages:

ejrgilbert commented 6 months ago

AST: Dtrace contains DScripts that has fns, globals, and a HashMap of used provider names to used module names to used function names to used probe names to the probe definition. These HashMap keys CANNOT be regex globs, rather we do a full expansion when building the AST to include ALL probes that are matched by the globs! This will help when traversing the AST for code generation! This means I need to have something that stores the structure of probes (providers -> modules -> functions -> types) to do the glob matching on for expansion.

The symbol table is initialized with all of the provider, module, function, probe type scopes. They are organized by HashMap as well, without globbing for names. When building the table, add in the DScript name that contains a HashMap again. This maps from probeid to probe scope, which will make entering/exiting scopes easier when traversing the AST. Might need to do something here to work with the globbing. Maybe probe IDs point to a Vec of probes? Then each probe has the correct probe type ID that points to the ALT, BEFORE, etc for looking up symbols. This will need to be generated by matching the globs to the provider/etc HashMap keys. If there's a match, add a new copy of this probe.

For lookup, the probe will need to first search in its DScript parent scope, then in its probe type parent scope which will search up the tree. The verify pass will make sure that vars/fns used in each probe (including globs) are actually available. If the user uses a glob, but then has a specific provided field/fn in its predicate/body, there will be an issue. That's where the verifier should check! If that's not defined, error!

NOTE: this setup requires a list of Wasm instruction strings to be matched by glob. I think this is fine since I will be able to easily add symbols per bytecode this way since there are already some for "call".

ejrgilbert commented 6 months ago

Then, during generation, I will have instances of Probe generators and DScript generators specific to what's defined in the users DScript. But the provider/module/function/type generators will be singletons. We will only have a "call" function generator for now, which should be fine since there won't be bindings for others yet anyway (so symbol table lookup won't get a hit on a binding for others).

During generation, bindings will be found by the symbol table and either the DScript specific generators will be called, or the singletons to generate OR define some field.

When we get to a probe to inject, lookup the symbols used in the predicate to get the bindings. Call will have some compiler defined stuff, get the value for these vars and do constant propagation to evaluate the variables out of the expression.

NOTE: I might need to add the Call, etc to the AST (to store/lookup values), but I don't need to worry about these initially. They'd be easy to add since they can just go at the dtrace level with nesting inside them (providers->modules->functions->types). This will not affect the already done DScript with probes inside them structure!

ejrgilbert commented 6 months ago
  1. (For now this phase gets combined with Phase 3) At this point we have enough information to emit code, we're doing another phase here in order to support bytecode rewriting OR Virgil Monitor creation. If we directly generated Wasm bytecodes in the last phase (like would have been done in MiniJava, we'd have to duplicate the above logic between the two targets.
    • Initialize a CodeEmitter
      • Initializes the application to instrument as a Walrus module
    • Generate the provider functions by iterating over the relevant parts of the SymbolTable
      • Call function in CodeEmitter that generates each of these, will be hardcoded Wasm Walrus code writing.
    • Traverse the AST
      • Tell the CodeEmitter to switch to whatever the relevant context is (probe, wasm, call, alt)
        • This will use Walrus to find the bytecodes we want to instrument!
      • Enter the predicates
        • Generate the code that does the comparisons as Walrus Wasm instructions
          • Call will know the logic for the various fields
          • We've already injected the functions from DtraceCore, will need to get the fn_ids to insert a call to those functions with the specified args
      • Enter the probe bodies
        • Call will know the logic for the various fields