WebAssembly / binaryen

Optimizer and compiler/toolchain library for WebAssembly
Apache License 2.0
7.45k stars 738 forks source link

Invalid optimization to select with --ignore-implicit-traps #3934

Open askeksa-google opened 3 years ago

askeksa-google commented 3 years ago

Optimizing this program with -all -O1 --ignore-implicit-traps:

(module
 (type $box (struct (field (mut i32)) (field (mut i64))))
 (export "f" (func $f))
 (func $f (param $0 (ref null $box)) (result i64)
  (block $label$1
   (block $label$2
    (br_if $label$2
     (ref.is_null
      (local.get $0)
     )
    )
    (br_if $label$1
     (i64.gt_s
      (struct.get $box 1
       (local.get $0)
      )
      (i64.const 8)
     )
    )
   )
   (return
    (i64.const 8)
   )
  )
  (struct.get $box 1
   (local.get $0)
  )
 )
)

yields

(module
 (type $box (struct (field (mut i32)) (field (mut i64))))
 (type $ref?|$box|_=>_i64 (func (param (ref null $box)) (result i64)))
 (export "f" (func $f))
 (func $f (param $0 (ref null $box)) (result i64)
  (if
   (select
    (ref.is_null
     (local.get $0)
    )
    (i32.const 1)
    (i64.gt_s
     (struct.get $box 1
      (local.get $0)
     )
     (i64.const 8)
    )
   )
   (return
    (i64.const 8)
   )
  )
  (struct.get $box 1
   (local.get $0)
  )
 )
)

Even when assuming that no traps are executed, it is invalid in general to change potentially trapping code from being conditionally executed to being unconditionally executed. This happens with the first struct.get here, as it is placed as the second arm of the select.

kripken commented 3 years ago

This is how "ignore implicit traps" works atm: it literally assumes no instruction will trap. It then sees no side effects in the struct.get, and unconditionalizes it in this example.

If that is not good enough (and, yes, it is pretty limiting) then we could consider another mode that also disallows unconditionalization. It would be significant work, however (as it would need to integrate with every transform that alters control flow).

(Probably a side note, but the first struct.get is not in the second arm of the select, but is the condition? The condition is the final operand, unlike if.)

askeksa-google commented 3 years ago

Right, so what you're saying is that the assumption triggered by --ignore-implicit-traps is:

All instructions are treated as having execution semantics where they can't trap.

Whereas I understood it as:

The given program will never execute an instruction with input values that cause it to trap.

I see that all places in the code where there's a "has side effects" check will need to distinguish between "can be reordered/removed" (which is the case with either assumption) and "can be unconditionalized" (which only holds with the first interpretation).

The second version is certainly useful. It's common to have a pattern like "if x is non-null, dereference x" or "if x is non-zero, divide by x", and the assumption is something that a compiler can feasibly uphold.

Is there a need for two modes, though? I don't see where the first interpretation can be useful.

(Yes, it's in the condition. I was confused by the ref.is_null being placed first originally. And in this case, swapping the two makes no difference.)

kripken commented 3 years ago

The usefulness of the first is in cases where the trap is not guarded against by code, because it really won't happen in any important cases. For example:

(if
  (..condition..)
  (i32.load ..)
  (i32.const ..)
)

The load can trap in theory on out of bounds, but in practice this can safely be turned into

(select
  (i32.load ..)
  (i32.const ..)
  (..condition..)
)

A guarded condition (if not null, use the reference) is indeed different than that.

Btw, I have some old experimentation in a branch for this, https://github.com/WebAssembly/binaryen/compare/implicit?expand=1 (regressions in the diff show some of the issues) I may try to get back to it. Do you have a sense for how much of a benefit it would give?

askeksa-google commented 3 years ago

Yes, there are some situations where only the first assumption will optimize the code, and where it will be sound to do so. But how can the sound cases be distinguished from the unsound ones?

Or rather, what is the property that has to hold for a program to make optimizations always sound under the first assumption? What rules does the compiler have to follow before it can safely enable the option?

Do you have a sense for how much of a benefit it would give?

Enabling the option reduces the total module size of barista3 by about 3.5%. It's hard to say how much of that would be retained by keeping only the sound cases.

kripken commented 3 years ago

Or rather, what is the property that has to hold for a program to make optimizations always sound under the first assumption?

Well, in linear memory as compiled by LLVM, OOB loads and stores simply never happen (and so they are not guarded behind ifs; they don't exist). In practice the only way they can happen is if the code is faulty (for example, if memory corruption gives you a pointer with random bits). This could be an issue, but isn't in practice - but if wasm trapped on null pointers in linear memory, it would, though...

With that said, the option is of limited use not because of loads and stores, but because of trapping float-to-int conversions, where LLVM does in fact emit guards. So it's rare for codebases to use AFAIK. We've considered having separate options per instruction from time to time.

Enabling the option reduces the total module size of barista3 by about 3.5%.

I see, thanks. That's larger than what I see on LLVM output (usually less than 1%). Do you have a sense about speed, or is that too early to tell?

askeksa-google commented 3 years ago

Do you have a sense about speed, or is that too early to tell?

Well, as-is, the benchmark doesn't run with the option enabled, due to the above unsound optimization.

If I force the EffectAnalyzer to ignore the --ignore-implicit-traps flag (leaving just the redundant cast elimination), it removes about 5% of all casts (reducing module size by 0.2%). The speed improvement is close to the measurement noise, but it does look like it is 1-2% faster.

kripken commented 3 years ago

I see, thanks. 5% of casts does sound significant to me.

Thinking more on this, I am not sure I have a clear enough understanding of:

The given program will never execute an instruction with input values that cause it to trap.

For example, if I have

if (check() > 0) {
  print("hello, I am going to trap");
  trap();
}

would all that code be optimizable to just

check();

? (If we assume a trap is never reached, then it seems like all the code back from it to the last branch is removable too.)

tlively commented 3 years ago

This is fairly similar to the treatment of UB in C compilers. If we treat traps as UB and assume that they will never be run dynamically, then yes, you can remove all code that would be run only if the trap would be run, including code before it.

Exception make this more interesting because if there is a possibility that print throws an exception in the example above, then you can't remove it because it can run and avoid the subsequent trap by throwing.

FWIW, I think this would be a much less surprising and more useful meaning for the ignore-implicit-traps option. We should have a much more limited option for use with C-like languages that ignores the potential for memory accesses specifically to trap.

askeksa-google commented 3 years ago

It would be valid to remove code backwards from traps until reaching code that fulfills either of:

Even code that potentially traps can be removed when post-dominated by a trap, since the assumption is it doesn't.

In the case of print, I'd say it can only be removed if it is guaranteed to always terminate, which would usually not be the case for print (it could block on I/O).