AnyDSL / thorin

The Higher-Order Intermediate Representation
https://anydsl.github.io
GNU Lesser General Public License v3.0
151 stars 15 forks source link

using conditionally chosen function in cuda()/nvvm() triggers assertion #128

Open michael-kenzel opened 1 year ago

michael-kenzel commented 1 year ago

the following code will reproduce the issue:

#[import(cc = "thorin")] fn nvvm(_dev: i32, _grid: (i32, i32, i32), _block: (i32, i32, i32), _body: fn() -> ()) -> ();
#[import(cc = "device")] fn threadfence() -> ();

#[export]
fn test(b: bool) -> () {
    let fun = if b { @|| { asm("nanosleep.u32 0;"); } } else { threadfence };

    nvvm(0, (1, 1, 1), (1, 1, 1), @|| {
        fun();
    });
}

compilation via artic with --emit-llvm results in

src/thorin/util/cast.h:42: L* thorin::scast(R*) [with L = thorin::Global; R = thorin::Def]: Assertion `(!r || dynamic_cast<L*>(r)) && "cast not possible"' failed.

Various seemingly irrelevant changes to the code such as, e.g., turning the else option in the initalization into a lambda that simply forwards to the original function

    let fun = if b { @|| { asm("nanosleep.u32 0;"); } } else { @|| threadfence() };

seem to resolve the issue in some cases but not others. None of these workarounds appear to be reliable in the context of a more complex codebase; something what worked in one example won't work in another.

Hugobros3 commented 1 year ago

The way the runtime support code is written, it expects the body to be a global containing a continuation (lift_builtins.cpp is supposed to take care of that), but because of the way your example is written, this doesn't happen properly. One of the problems seems to be that threadfence is not getting handled properly there, maybe because it's external ?

I originally though the issue would be that your if gets turned into a select or a phi, but actually there is some magic that drops the nvvm() call inside the branches of the if (don't ask me how that works, I'm not sure myself !), so that's not it. This needs a lot more attention than I can spare right now.

IMO, the way the "runtime plumbing" part of thorin works is rather brittle and hard to understand, it would not hurt if someone would rewrite it to be saner. I might do a pass over it when I wire in shady's runtime to it, but in the meantime you're welcome to try to make sense of this and I can help you out on Discord if you need.