AnyDSL / thorin

The Higher-Order Intermediate Representation
https://anydsl.github.io
GNU Lesser General Public License v3.0
151 stars 15 forks source link

CUDA backend emits invalid inline assembly with compile-time constant inputs #131

Closed michael-kenzel closed 1 year ago

michael-kenzel commented 1 year ago

The type of compile-time constant inputs seems to not be correctly considered when emitting inline assembly, generating invalid CUDA code. For example

asm("atom.global.exch.b64 %0, [%1], %2;" : "=l"(res) : "l"(location), "l"(value) : "memory");

when invoked with value: i64 = 0 emits

        asm ("atom.global.exch.b64 %0, [%1], %2;"
            : "=l" (_1192) /* outputs */
            : "l" (p_1185), "l" (0) /* inputs */
            : "memory" /* clobbers */
        );

note the "l" (0) input generated for value. The problem is that the type of 0 is int, which is not 64-bit like the original value was and, thus, does not match the width of the asm operand.

The following example will reproduce the problem:

#[import(cc = "thorin")] fn cuda(_dev: i32, _grid: (i32, i32, i32), _block: (i32, i32, i32), _body: fn() -> ()) -> ();

#[export]
fn main() {
  let fun = @|location: &mut addrspace(1) i64, value: i64| {
    let mut res:i64;
    asm("atom.global.exch.b64 %0, [%1], %2;" : "=l"(res) : "l"(location), "l"(value) : "memory");
  };

  let p = 42 as &mut addrspace(1) i64;

  cuda(0, (1, 1, 1), (1, 1, 1), @||{
    fun(p, 0);
  });
}