Open PolyRocketMatt opened 2 months ago
Hi @PolyRocketMatt , thanks for the report.
To see the generated PTX code, you can use the option --printKernel
for the tornado
command.
Just having a quick look at the issue, and I am not sure the annotation @NotNull
is supported. It might happen that IR is not clean (ready to be consumed by the PTX code generator) because the annotation introduced more nodes. Can you check without this?
Hi, thanks for the fast response.
I have tried compiling and running the code without the @NotNull
annotation, but this still seems to work only occasionally still. This is the generated PTX code, which seems to be running in version 7.6:
.version 7.6
.target sm_86
.address_size 64
.visible .entry 90375540_e9cf_461e_b9a2_d7dc8d46e67e_mxm_addfloat_arrays_floatarray_arrays_floatarray_1_0_1024(.param .u64 .ptr .global .align 8 kernel_context, .param .u64 .ptr .global .align 8 input, .param .u64 .ptr .global .align 8 output, .param .align 8 .u64 value, .param .align 8 .u64 size) {
.reg .s64 rsd<3>;
.reg .u64 rud<5>;
.reg .s32 rsi<6>;
.reg .pred rpb<2>;
.reg .u32 rui<5>;
BLOCK_0:
ld.param.u64 rud0, [kernel_context];
ld.param.u64 rud1, [output];
mov.u32 rui0, %nctaid.x;
mov.u32 rui1, %ntid.x;
mul.wide.u32 rud2, rui0, rui1;
cvt.s32.u64 rsi0, rud2;
mov.u32 rui2, %tid.x;
mov.u32 rui3, %ctaid.x;
mad.lo.s32 rsi1, rui3, rui1, rui2;
BLOCK_1:
mov.s32 rsi2, rsi1;
LOOP_COND_1:
setp.lt.s32 rpb0, rsi2, 1024;
@!rpb0 bra BLOCK_3;
BLOCK_2:
add.s32 rsi3, rsi2, 6;
cvt.s64.s32 rsd0, rsi3;
shl.b64 rsd1, rsd0, 2;
add.u64 rud3, rud1, rsd1;
st.global.f32 [rud3], 0F3F800000;
add.s32 rsi4, rsi0, rsi2;
mov.s32 rsi2, rsi4;
bra.uni LOOP_COND_1;
BLOCK_3:
ret;
}
I'm not sure if this is the solution, but removing the UUID from the name of the task graph (and subsequently the PTX code), seems to behave stable. I'll be doing some further experimentation myself to see if this still keeps fixing the errors I was getting.
Describe the bug
Whenever I am trying to run my application on the latest TornadoVM build, it only at some occasions throws an error indicating that PTX JIT compilation failed:
How To Reproduce
My Main.java just simply runs the following:
The AdditionTask class looks like this:
In this case,
BufferTask
is just an empty interface.Expected behavior
I'm expecting the code to run, without throwing any compilation issues. This happens in some cases but not all.
Computing system setup (please complete the following information):
Additional context
The attached log is generated using the
--debug
flag and is one from my original program. The minimal reproducible example should still be a valid proxy.debug.txt