bytecodealliance / wasmtime

A fast and secure runtime for WebAssembly
https://wasmtime.dev/
Apache License 2.0
15.2k stars 1.28k forks source link

Cranelift: scoped elaboration duplicates pure operations even when there is no partial redundancy to avoid #8959

Open meithecatte opened 2 months ago

meithecatte commented 2 months ago

.clif Test Case

test optimize
set opt_level=speed
target x86_64

function %f(i32, i32, i32) -> i32 {
block0(v0: i32, v1: i32, v2: i32):
    v3 = imul.i32 v0, v1
    brif v2, block1, block2
block1:
    return v3
block2:
    v4 = iconst.i32 1
    v5 = iadd.i32 v3, v4
    return v5

}

;; check that the two occurrences of v3 are still the same value after optimization
; check: return v3
; check: iadd.i32 v3, v4

Steps to Reproduce

Expected Results

The imul stays in block0.

Actual Results

The imul is needlessly duplicated between block1 and block2.

Versions and Environment

Cranelift version or commit: de29ce35982eba2060709491309ccc93b397168c (main at the time of writing)

Operating system: Linux

Architecture: x86_64

cfallin commented 2 months ago

That's true, but unfortunately I can't see a low-cost of way avoiding this: one would have to ask "is this value demanded in every subtree of the domtree from this point", at every point, and move it there; in the extreme, requiring an upward-pass on the domtree. Remember that we're reconstructing code placement as we recreate the CFG, we don't have any notion that there was a single imul in the parent block previously.

I would consider this an issue in the same category as the "instruction scheduling" general concern (#6159 / #6260): the current algorithm is working as designed and produces correct code, but better heuristics could be better.

I'm curious if you have an idea for a better algorithm here?