AnyDSL / thorin

The Higher-Order Intermediate Representation
https://anydsl.github.io
GNU Lesser General Public License v3.0
151 stars 15 forks source link

Rework completely PE and lower2cff #95

Open madmann91 opened 5 years ago

madmann91 commented 5 years ago

Right now, there is the PE phase that does partial evaluation and lower2cff at the same time. This has some issues, as pointed out before with issue #93. However, even with a fix for that, there is a more fundamental problem here. Consider the following code:

extern "C" { fn rand() -> i32; }

fn range(a: i32, b: i32, body: fn (i32) -> ()) -> () {
    if a < b {
        body(a);
        range(a + 1, b, body)
    }
}

fn something(pred: fn (i32) -> bool) -> bool {
    pred(rand()) && (something(pred) || something(pred))
}

extern fn boom(n: i32, res: &mut [bool]) -> () {
    for i in range(0, 8) {
        res(i) = something(@ |i| i < n);
    }
}

The function something uses a predicate function pred, which at the call site in function boom happens to be an anonymous function capturing n, a parameter from boom. Now the problem is that something can be specialized to that predicate without any issue (with the fix for issue #93), but it then makes it depend on the parameter n (because the predicate refers to n). No amount of specialization can solve that problem: What we need to do is to lift the function something from its scope after specialization, which means adding a parameter that takes a value for n. In order to do this, it makes sense to decouple PE from lower2cff (even though some code can be shared), since lower2cff requires more work.

Right now, this example hangs on both master and pe_fix. Note that this example is in no way contrived or rare. Such a pattern happens for every divide and conquer algorithm out there (it was encountered while coding a version of the quick sort algorithm).

madmann91 commented 5 years ago

Right now, people looking for a quick fix can implement such algorithms by managing the stack manually (creating a local mutable array for it). That's however clearly not a long term solution.

leissa commented 5 years ago

The eurollvm branch has a phase where this lifting is implemented. However, it was never mature enough to get merged into master. I will have a look at this again in the next couple of days. I'm currently in bug-fixing mode :)

madmann91 commented 5 years ago

There is also a subtle problem with the partial evaluator as it is now, which I thought might be of interest if you are working on this. The criteria of the partial evaluator that determine when to perform inlining/specialization are incorrect. The problem lies in ConEval::eval():

bool eval(size_t i, bool lower2cff) {
    // the only higher order parameter that is allowed is a single 1st-order fn-parameter of a top-level continuation
    // all other parameters need specialization (lower2cff)
    auto order = callee_->param(i)->order();
    if (lower2cff)
        if(order >= 2 || (order == 1
                    && (!callee_->param(i)->type()->isa<FnType>()
                        || (!callee_->is_returning() || (!is_top_level(callee_)))))) {
        DLOG("bad param({}) {} of continuation {}", i, callee_->param(i), callee_);
        return true;
    }

    return (!callee_->is_external() && callee_->num_uses() == 1) || is_one(instantiate(filter(i)));
}

The condition !callee_->is_external() && callee_->num_uses() == 1 is the culprit. Consider the following code:

fn @f(i: i32) -> () {
    fn g() -> () {
        println(i)
    }
    ($g)()
}
// ...
for i in unroll(0, 10) {
    f(i)
}

The resulting code should generate 10 different g functions and not inline g 10 times, which is what is happening here because g is only called only once and the criterion is hence verified. This issue is also in eta_conversion() (in the file transform/cleanup_world.cpp).

In general, the condition should be: callee_->num_uses() == 1 && callee_->order() == 1

(i.e. this optimization is only valid for basic blocks)