eager evaluation of all functions (not just wizer. initialize)

benmkw commented 1 year ago

Currently wizer runs wizer.initialize to precompile code.

This leads to bad patterns in existing code though: Say I have a function foo() which is a pure function without arguments which is not fully optimised by e.g. llvm (cc https://github.com/bytecodealliance/wizer/issues/60) and I want to use wizer to optimise it to a constant.

Currently I have to mark foo as wizer.initialize and put the result in a global variable and then have to read from that variable everywhere I would have otherwise called foo (as far as I can see).

Now this is not really bad for one function but gets tedious/ impossible to maintain in the general case.

Could wizer have an option where it basically eagerly tries to evaluate every function/ block of code and if it succeeds then all calls to that block during the later runtime will be optimised away (if, the wasi option is not enabled then all blocks which call the etc. would just be kept and not optimised).

This would largely remove the need for wizer.initialize (except for when you need that specific granularity) and offer more far reaching optimisation opportunities while keeping the code thats being optimised in the natural/ more maintainable form.

Essentially this would lead to wizer behaving more like a general additional llvm pass applied to the code.

benmkw commented 1 year ago

to give a specific example in some rusty-pseudocode:

// (no annotations needed)
fn foo() -> i32 {
  let a = vec![3; 300].sum(); // this gets optimised away by wizer, (llvm might not optimise this because of heap access (it might do it, but just as a simple example))
  let file : String = read("file.txt"); // this does not get optimised away without wasi, but does get run with wasi 

  // this gets optimised away by wizer even without wasi, independently of the previous line of code
  let b = vec![7; 100].sum(); 

  return file.parse::<i32>() + a + b; // this gets at least partially evaluated by wizer, fully constant folded with wasi
}

cfallin commented 1 year ago

Could wizer have an option where it basically eagerly tries to evaluate every function/ block of code and if it succeeds then all calls to that block during the later runtime will be optimised away (if, the wasi option is not enabled then all blocks which call the etc. would just be kept and not optimised).

This is something that sounds quite nice to have, but is not possible without a much more rigorously clear semantics of what it actually means. To be precise: what should the tool do with the function body you show? Should it:

Evaluate every expression tree in the Wasm that can be constant-folded? (This will not be very many -- LLVM will have already done so)
Symbolically execute code including loops and control flow?
- If so, what is the heap state that is assumed, and how are side-effects handled?
- Consider the example of memory allocation. If you have several functions like your foo, each of which allocates temporary memory then frees it, do we symbolically execute the entire malloc and free implementation? Do we do so sequentially for each function body, such that each malloc sees the state left by the last free? Or do we do it assuming the initial heap image? Does this change if the pointer escapes? How can we know, at a static-analysis level, whether this symbolic memory manipulation is safe? Reason about separation-logic or commutativity of effects? (These are deep research topics...) How do we distinguish the above from the case where the allocated memory should become part of the image (as for your file string above)?
- How do we handle partially-known inputs?
- What do you mean by "independently of the previous line of code"? What is the unit of code that Wizer operates on? Keep in mind it operates on Wasm bytecode; it has no concept of source-level (e.g. Rust-level) semantics.

There's a lot more I could say here; and folks have attacked similar problems from many angles (some good search terms are "partial evaluation" and "symbolic evaluation"); but unfortunately this is a deep rabbithole and one won't get very far unless one nails down the semantics of what one expects.

(This is why Wizer is successful, for what it's worth: it has a very clear and crisp semantics of "run a function, then save the memory state afterward". It's simple enough that one actually has some hope of implementing it!)

benmkw commented 1 year ago

This is something that sounds quite nice to have, but is not possible without a much more rigorously clear semantics

Yes this makes sense, I just wanted to get the discussion started so I'll try to be more precise now:

If so, what is the heap state that is assumed, and how are side-effects handled?

I'm not sure I understand the question. The following psudo-rust code should do just that and should work with the current wizer approach. A global variable with null pointers is created and the wizer init function calls malloc and populates the array with pointers into the allocated heap.

My understanding is that this is a good example of what wizer can do and what llvm currently mostly refuses to do because wasm enables a holistic view of the program, including the heap state and thus wizer can execute code which modifies the heap. Wizer then effectively snapshots the heap such that the program state is -as if- it was executed in a wasm runtime. This should also resolve the questions around heap state etc.. It seems though that I might be missing something here?

static MALLOCS : [void*; 10] = [null; 10];

fn wizer_innit() {
    for e in MALLOCS {
        *e = malloc(1);
    }
}

fn main() {
  dbg!(&MALLOCS); // this prints some non-null numbers
}

do we symbolically execute the entire malloc and free implementation

I think in the wasm blob the malloc implementation is already included (like one can include wee-alloc in wasm rust crates) and is not executed symbolically but actually as if it were run in any other wasm runtime, the state change which results by executing the code is also committed by wizer such that the state of the allocator which tracks mallocs/ frees is -as if- it had executed natively up to that point and thus seamlessly resumes during later execution as if nothing special had happened.

How do we handle partially-known inputs?

I think the behaviuour is identical to the wasi flag, the wasm evaluator would stop at places where it no longer can make progress. The only difference/ enhancement would be, that it would try to resume at some kind of block boundary which would need to be defined more precisely.

cfallin commented 1 year ago

So there are a few issues that come up with malloc, which I brought up as it's a useful example that pulls in all one would require for a general implementation of this:

The main issue is input-dependent behavior. This is what requires "partial evaluation" and why I asked about symbolic execution. Consider: what happens when a malloc size is computed from a function parameter?
- You might say: just don't try to evaluate functions with parameters, since the values aren't constant. That's fine for the entry-points you choose to apply this transform to, but then can they not call any functions with parameters? Presumably we recursively process each invoked function to find its constant value as well. Now we can only handle functions that never call other functions with parameters, transitively.
- So we need to symbolically execute (or you could say, "do an abstract interpretation over a domain that includes concrete values and unknown runtime-only values").
- How do we handle control flow in such a domain? What if a loop trip-count is dependent on runtime-only data? What if an if-else diamond branches based on runtime-only data?
The state of globals is also an implicit input to every function execution. What if we have a function that performs a "globally unique ID allocation", and has no arguments: do we constant-evaluate it once, and memoize the supposedly unique ID 42 that it returns? We've updated globals after the execution...
- So we need a notion of "takes globals as implicit argument" and do our symbolic thing accordingly!
- ... but now malloc returns a symbolic pointer, or can never complete because internal control flow would depend on (e.g.) whether there is space left in a block or not
- ... so we bifurcate the world and do a multi-case evaluation but now we have combinatorial explosion...

I could go on; I'm raising these questions mostly as an illustration of the complexities one immediately runs into, and the "easy" answers to them, if they exist, raise many more questions or cases that you would get wrong. The main point I am trying to make is that this is not an easy or simple problem, and an optimistic "just evaluate the thing until you can't!" unfortunately is broken in a ton of ways.

The issue is that in a linear-memory world with mutable globals, we don't have "pure" computations, and this is what we would need to do such constant-value optimizations properly. (The corollary is that something like Haskell can -- and in fact is! -- optimized in just this way.) The ability to see the heap image and modify it in a tool is a false benefit here -- sure, we can do so, but we are forced to make a bunch of assumptions and any given path that we choose will break perfectly-well-defined code in some cases.

This is why I'm pushing toward a "what is the model?" question: we need a well-defined semantics, like staged computation. (Wizer can be seen as a very simple form of staged computation.) There has been talk of baking this into Wasm as a new proposal, at least at the CGs. Doing so would require active use by the Wasm guest though, and could never be fully automatic, for the reasons above.

fitzgen commented 1 year ago

On top of the semantics questions that Chris brings up, implementing something like this would require essentially a new tool, as Wizer doesn't really do any kind of analysis or inspection of the Wasm itself, it just runs and snapshots. It is a very simple tool. Doing the things you propose will most likely require an IR, analyses, etc... none of which Wizer has or intends to grow.

For truly pure/const functions and expressions binaryen's wasm-opt does a very good job of cleaning them up, and I suggest that you run wasm-opt after wizening if you aren't already. It can also clean up and remove code that is dead after wizening.

fitzgen commented 1 year ago

Closing this issue because the feature is not planned for Wizer, but happy to continue discussion and answer questions if you have them.

benmkw commented 1 year ago

Thanks for your detailed response. Based on the raised concerns I'd like to propose a final stripped down alternative which I'd offer for consideration:

Would it be possible to annotate n (pure) functions with the attribute wizer.initialize_n (wizer.initialize_1, wizer.initialize_2,...) and have wizer execute these functions the same as it currently does with the one called wizer.initialize?

Would it further be possible for wizer to replace all calls to these annotated functions with a lookup of the resulting return value of the corresponding function?

That would make it possible to factor out pure functions (with constant or no inputs) out of a program and have wizer constant fold/ evaluate them and replace all calls to those functions with a constant value.

This would maybe give wizer enough information/ manual annotation to do the correct work and also would keep the code maintainable and remove the need for global state/ larger refactorings.

fitzgen commented 1 year ago

Would it be possible to annotate n (pure) functions with the attribute wizer.initialize_n (wizer.initialize_1, wizer.initialize_2,...) and have wizer execute these functions the same as it currently does with the one called wizer.initialize?

You could do this today by running Wizer multiple times, on each function. This sidesteps the issue of Wizer having to choose an order to call the functions in, which it isn't really in a good place to make decisions about.

benmkw commented 1 year ago

by running Wizer multiple times, on each function

that sounds interesting, how would I do that? Until now I only knew one could add the attribute once, compile to code to wasm, run wizer. How would one add the attribute again, to a difference function?

This sidesteps the issue of Wizer having to choose an order to call the functions in, which it isn't really in a good place to make decisions about.

I could imagine order of 1...n would be appropriate/ pure functions should not depend on the order anyway but its certainly good if one can sidestep that issue.

Does adding wizer.initialize to a function have the effect that later calls to that function are effectively a constant lookup? I assumed one would need to write to a global as a side effect of the init function to store the result but it seems that this was not necessary which is already good to know.

bytecodealliance / wizer

eager evaluation of all functions (not just wizer. initialize) #74