WebAssembly / binaryen

Optimizer and compiler/toolchain library for WebAssembly
Apache License 2.0
7.5k stars 743 forks source link

Peculiar "IR must be flat on return" when modifying returns #3407

Open dcodeIO opened 3 years ago

dcodeIO commented 3 years ago

I'm currently experimenting with implementing a pass using just the C-API. Most noteworthy limitation I encountered is that it's not yet possible to replace an expression in-place with just the C-API, so I tried to work around it by modifying the actual expression in, well, unusual ways.

Unsurprisingly, I hit a not-sure-if-interesting case when I wanted to prepend some code to execute before a (return) that returns, well, nothing. My first attempt was to modify the return as follows:

(return
 (someCallReturningNone)
)

which looks funny but kinda works, except that it produces

Fatal: IR must be flat: run --flatten beforehand (instructions must only have constant expressions, local.get, or unreachable as children ...)

during optimizations where flat code is required, even though --flatten has been run. Likewise, this produces the same error:

(return
 (block
  (someCallReturningNone)
 )
)

but this works:

(return
 (block
  (someCallReturningNone)
  (return) ;; :-)
 )
)

Now I'm not sure if this might or might not be a problem due to the nature of what I'm doing there, but thought I report just in case. Perhaps doing strange things like the above can be trivially supported in a way that no error is produced.

If you have hints how to implement a proper BinaryenExpressionReplace of sorts, that'd be even better ofc :)

tlively commented 3 years ago

I think this is working as expected. Neither of the IR patterns that produced errors are "flat" (although they are generally valid Binaryen IR) so those errors are correct. You can find the specification of what it means to be "flat" in ir/flat.h. For you experimentation, these errors should be resolved if you run --flatten between your custom transformation and whatever pass it is that requires flat code.

dcodeIO commented 3 years ago

these errors should be resolved if you run --flatten

Unfortunately the error happens even though --flatten has been run, indicating that these patterns are not being flattened. Quick test case to reproduce:

var binaryen = require("binaryen");

var module = new binaryen.Module();
module.addFunction("nothing", binaryen.none, binaryen.none, [], module.nop());
module.addFunction("test", binaryen.none, binaryen.none, [],
  module.return(
    module.call("nothing", [], binaryen.none)
  )
);
module.addExport("test", "test");

console.log(module.emitText());
if (!module.validate())
  console.log("-> does not validate");

binaryen.setOptimizeLevel(4);
module.runPasses(["flatten"]);
module.optimize(); // Fatal: IR must be flat: run --flatten beforehand
console.log(module.emitText());
MaxGraey commented 3 years ago

binaryen.setOptimizeLevel(4); module.runPasses(["flatten"]);

I think you should use custom pipeline here instead setOptimizeLevel + flatten. Try to use something like:


add("ssa-nomerge")
add("flatten");
add("vacuum");
add("spill-pointers"); // <- your custom spill pass
dcodeIO commented 3 years ago

What this shows currently is that flatten doesn't flatten this specific expression. I think it shouldn't depend on specific pass order if immediately invoking flatten doesn't handle the pattern already, or am I missing something?

MaxGraey commented 3 years ago

As I understand transform to Flatten IR & SSA IR can only schedule in pass pipeline. In this case pass ordering is pretty important. At least for ssa / ssa-nomerge and flatten.

dcodeIO commented 3 years ago

Iiuc flatten doesn't build a separate IR, but modifies the main IR, and every pass in the default pipeline depending on flattened code ensures that the IR is flat. In case it helps, the pattern not being flattened in a way that a check for flattened code approves it essentially is:

[unreachable] (return
[none]          (call $someThing)
              )

The workaround I utilize is

[unreachable] (return
[unreachable]   (block
[none]            (call $someThing)
[unreachable]     (return)
                )
              )

which can be flattened properly, somehow.

tlively commented 3 years ago

Actually, thinking about this more, I'm not sure this should be valid Binaryen IR:

(return
 (someCallReturningNone)
)

When return takes an argument, it would be reasonable to expect that the argument have concrete (or unreachable) type, so my guess is that --flatten does not expect to have to flatten return expressions. Does this code pattern pass validation?

dcodeIO commented 3 years ago

Yes, it passes validation. One downside of disallowing it might be that this may also disallow the workaround, hmm.

tlively commented 3 years ago

I see that the validator does not allow the arguments to Break expressions to have Type::none, so my guess is that this is indeed an oversight in the validation of Return expressions. @kripken, do you concur? @dcodeIO, it looks like you're right that this validator fix would break your workaround, though. Would you want to add proper traversal and replacement support to the C API and use that instead?

dcodeIO commented 3 years ago

Sure, proper traversal would be ideal. Can look into that :) How would one ideally design the JS callback with Emscripten in a way that it also works with the C-API?

kripken commented 3 years ago

@tlively Yes, it looks like missing validation. That should not validate.

tlively commented 3 years ago

@dcodeIO, I'm imagining that the C API would look something like this:

void BinaryenWalkExpression(BinaryenExpressionRef expr, void (visitExpr*)(BinaryenExpressionRef*, void*), void* state);

The user gets to define some state of their choice that is passed to the callback function along with a pointer to BinaryenExpressionRef being visited. Visited nodes can be replaced by writing to the BinaryenExpressionRef pointer. In the JS API, I guess the callback needs to be placed into the WebAssembly table and and the table index passed to BinaryenWalkExpression? The JS function will close over whatever state it needs, so the JS API can just pass a null value for state. @kripken, I'm not actually sure how we expect users to add custom callbacks to the table. Do we have an API for that?

kripken commented 3 years ago

Another API option might be to pass a function pointer to a "replaceCurrent" method that replaces the current expression.

Emscripten has an addFunction API that lets you add a JS function to the table, and refer to it via a function pointer.