AnyDSL / thorin

The Higher-Order Intermediate Representation
https://anydsl.github.io
GNU Lesser General Public License v3.0
151 stars 15 forks source link

RV: use regions instead of WFV mode #83

Open leissa opened 6 years ago

leissa commented 6 years ago

WFV mode does not support recurrences:

let mut a = 0;
for i in vectorize(0, n) {
    f(a);
    a = A(i);
}

Switching to loop regions in RV solves this problem:

The sequence alignment stuff for AnyDSL/sequal will need this feature.

leissa commented 6 years ago

@muellan : This might be interesting for you.

madmann91 commented 6 years ago

What's not supported here? I think I need a bit of context, because if A is not modified by f, then your loop body is simply:

for i in vectorize(0, n) {
    f(if i == 0 { 0 } else { A(i - 1) })
}

We had a discussion about using the loop mode. In the end, we did not do it because we would still need to isolate the part to vectorize in a function (because RV requires some LLVM transformations that should not be applied to the enclosing function, and because we want to support nested calls to vectorize). That said, if it solves your problem, then why not.

simoll commented 6 years ago

Suppose you want to write something like this:

ap = A(0)
for i in vectorize(1, n) {
    a = A(i)
    B(i) = max(ap, a)
    ap = a
}

If you lower this to a loop in LLVM-IR you will see a phi node in the header that carries a/ap. However, thorin currently outlines the loop body to apply RV in WFV mode. There is not canonical way to express a "function instance"-carried dependency for a/ap. So, you have to resort to reloading A(i) or the like. However, if you apply RV in loop vectorization mode the ReductionAnalysis will kick in, recognize the recurrence (a/ap header phi) and create the right code.

madmann91 commented 6 years ago

That's clear, but you can rewrite your example to work without support for loop carried dependencies:

for i in vectorize(N, 1, n) {
    a = A(i)
    a0 = A(extract(i - 1, 0)) // A(i - 1)
    as = shuffle(a, -1)       // [ A(N - 1), A(i), A(i + 1), A(i + 2), A(i + 3), ...]
    ap = insert(as, 0, a0)    // [ A(i - 1), A(i), A(i + 1), A(i + 2), A(i + 3), ...]
    B(i) = max(ap, a)
}

While not really pretty, this approach works right now and can be used as a workaround while we implement support for this feature.