Closed michael-kenzel closed 1 year ago
The type of compile-time constant inputs seems to not be correctly considered when emitting inline assembly, generating invalid CUDA code. For example
asm("atom.global.exch.b64 %0, [%1], %2;" : "=l"(res) : "l"(location), "l"(value) : "memory");
when invoked with value: i64 = 0 emits
value: i64 = 0
asm ("atom.global.exch.b64 %0, [%1], %2;" : "=l" (_1192) /* outputs */ : "l" (p_1185), "l" (0) /* inputs */ : "memory" /* clobbers */ );
note the "l" (0) input generated for value. The problem is that the type of 0 is int, which is not 64-bit like the original value was and, thus, does not match the width of the asm operand.
"l" (0)
value
0
int
The following example will reproduce the problem:
#[import(cc = "thorin")] fn cuda(_dev: i32, _grid: (i32, i32, i32), _block: (i32, i32, i32), _body: fn() -> ()) -> (); #[export] fn main() { let fun = @|location: &mut addrspace(1) i64, value: i64| { let mut res:i64; asm("atom.global.exch.b64 %0, [%1], %2;" : "=l"(res) : "l"(location), "l"(value) : "memory"); }; let p = 42 as &mut addrspace(1) i64; cuda(0, (1, 1, 1), (1, 1, 1), @||{ fun(p, 0); }); }
The type of compile-time constant inputs seems to not be correctly considered when emitting inline assembly, generating invalid CUDA code. For example
when invoked with
value: i64 = 0
emitsnote the
"l" (0)
input generated forvalue
. The problem is that the type of0
isint
, which is not 64-bit like the originalvalue
was and, thus, does not match the width of the asm operand.The following example will reproduce the problem: