Closed jchlanda closed 2 years ago
What version of ROCm are you running?
My version is 4.3.0-52. I don't know their version.
What version of ROCm are you running?
Sorry, I should have mentioned that, I see the same error on 4.2.0-21
and 4.3.0-52
.
Thanks. I'm assuming you're using the AMDGCN back end from the LLVM within the ROCm release? If not, what are you using? Can you create a HIP example and which matches one of your failing examples as closely as possible and compare the IR from your front end to the IR from the ROCm HIP front end? Are you matching all the attributes, metadata, calling conventions, etc?
How are you loading and launching the code object? Using the HIP runtime or OpenCL runtime? The HIP runtime cannot always properly launch an OpenCL kernel and similarly for a HIP kernel launched by the OpenCL runtime.
This sounds to me like some plumbing is not set up correctly and so the GPU somehow is not executing the ISA bits you want it to.
Thanks. I'm assuming you're using the AMDGCN back end from the LLVM within the ROCm release?
Yes, that's correct dpcpp
is on, or very close to LLVM's tip, so it stays up to date with AMDGCN backend.
This becomes a bit more problematic, since our servers have now defaulted to 4.3.0
and with that non of the benchmarks from https://github.com/zjin-lcf/oneAPI-DirectProgramming/ when run with save-temps
reproduce the illegal instruction error. The benchmark that triggered this investigation now gives back HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION
.
Following on form a comment here: https://github.com/intel/llvm/issues/4553#issuecomment-933817079 I was able to get the same error with a simple sin
SYCL kernel (again, only when save-temps
is not passed to the compiler):
#define TY cl_float
#define IN_VALUE 90.0f
#define OUT_VALUE 42.42f
//#define TY cl_double
//#define IN_VALUE 90.0
//#define OUT_VALUE 42.42
int main(int, char **) {
TY in = IN_VALUE;
TY out = OUT_VALUE;
cl::sycl::gpu_selector device_selector;
cl::sycl::queue queue(device_selector);
{
cl::sycl::buffer<TY> in_sycl(&in, cl::sycl::range<1>(1));
cl::sycl::buffer<TY> out_sycl(&out, cl::sycl::range<1>(1));
queue.submit([&](cl::sycl::handler &cgh) {
auto in_acc = in_sycl.get_access<cl::sycl::access::mode::read>(cgh);
auto out_acc = out_sycl.get_access<cl::sycl::access::mode::write>(cgh);
cgh.parallel_for<class sin_cos>(cl::sycl::nd_range<1>(1, 1),
[=](cl::sycl::nd_item<1> item) {
out_acc[0] = cl::sycl::sin(in_acc[0]);
});
});
}
printf("Got: %f\nExpected: %f\n", out, std::sin(in));
return 0;
}
An equivalent HIP program with HIP_CLANG_PATH
pointing to dpcpp
's LLVM build directory passes with no issues. I have noticed that hipcc
's clang invocation sets some extra backend flags: -mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false
, interestingly enough those flags make the sin
SYCL kernel pass. Those flags are not the silver bullet though, there are larger, more complicated kernels that still fail even with the flags appended. And hipcc
passes even with those removed. Looking at the assembly generated for sin
, the compiler just did what it was asked for and inlined everything, mu guess is that for larger kernels it was not able to do that.
It looks like the codegen is off the hook now and more likely the bug is somewhere in the instrumentation of the kernel/kernel arguments, or perhaps calling convention of SYCL kernels/functions.
I've attached asm
files for the simple sin
kernel that triggers memory violation:
sin_extract.log
and the inlined variant that passes:
sin_no_function_calls_extract.log
As I wasn't able to use save-temps
I've hacked the LiveDebugValues
pass to dump all the functions it runs on and extracted the relevant ones.
We've found the issue that was causing the illegal instruction error (https://github.com/intel/llvm/pull/4991). Closing.
While working with DPCPP (https://github.com/intel/llvm) compiling SYCL kernels targeting ROCm (gfx908) I came across multiple benchmarks failing with
HSA_STATUS_ERROR_ILLEGAL_INSTRUCTION
error:for example:
Going by the error message: "The agent attempted to execute an illegal shader instruction." I took the
s3d
benchmark and used DPCPP's-save-temps
mechanism to obtain the device assembly and verify the instructions. I found out that it isratx2
(https://github.com/zjin-lcf/oneAPI-DirectProgramming/blob/master/s3d-sycl/ratx2.sycl) kernel that triggers the error. Working with save temps combined with replaying of the commands to run for this compilation pipeline, as obtained through-###
switch, allowed me to by-hand modify and test the offending kernel's assembly. It would appear that the error is somewhat misleading, as I managed to preserve the erroneous behaviour, while removing all the original instructions and replacing them withs_nop
(the number of nops was not a one to one match with instructions in the original kernel though). Moreover, for this particular benchmark I was able to establish such a number of nops in the kernel in question, that trigger theHSA_STATUS_ERROR_ILLEGAL_INSTRUCTION
error, while removing a single nop would also remove the error. For what is worth,ratx2
kernel with only 1344 triggers the error. Unfortunately it's not quite as simple as the code size of the kernel, I tried compiling and executing a SYCL kernel with only 1344 nops in it, but the error did not reproduce.I was wondering if there are any known issues that could be responsible for this behaviour, or if there is anything developers could do to debug it, or work around it.
Please do let me know if you'd like to see any of the generated files.