was found when running a HeCBench benchmark (bn-cuda), which was compiled with -O3 flag. This error comes up even though the program does not call the ldexpf function, and it is checked that chipStar does have a ldexpf definition here that uses OpenCL ldexp function.
The trace was checked, and it is found that the zeModuleCreate returns ZE_RESULT_SUCCESS even though there was an error in the build log (unresolved external symbol ldexpf), and because the kernel was failed to be created with ZE_RESULT_ERROR_INVALID_MODULE_UNLINKED, the kernel launch fails during run-time.
After digging in, the error seems to be coming from the powf call in the program (please see the reproducer at the end of this issue for reference), and the builtin_powf in one of the two powf definitions seems to be causing the error:
https://github.com/CHIP-SPV/chipStar/blob/4edbcb68a0a647493a27490c9c87ccaa896dafbc/include/hip/devicelib/single_precision/sp_math.hh#L439-L449
When commented out the __builtin_powf and forced using the OpenCL pow function as powf definition, the error disappeared. However, if we only call powf in a reproducer, no errors are observed, so it seems like the `builtin_powfis not the only source of the error. Also, the program has to be compiled with an optimization flag for the error to show up (tested-O, -O1, -O2, and -O3`, all of which produce the error).
[Reproducer]
Clone and build chipStar
Create a reproducer.cu file and paste the following code:
__global__ void kernel() {
float lsinblock[10000] = { 0 };
int t = 0;
//int a = 0; // used for following testing
for (int i=0; i<10; i++) {
t = (int)lsinblock[(int)powf(2.0, i)+t]; // error
//powf(2.0, i); // works
//a = (int)powf(2.0, i); // works
//a = (int)lsinblock[(int)powf(2.0,i)+t]; // works
//a = (int)lsinblock[(int)powf(2.0,i)+0]; // works
//t = (int)lsinblock[(int)powf(2.0,i)+0]; // works
//t = (int)lsinblock[(int)powf(2.0,i)+5]; // works
}
}
3. Compile the code with `nvcc -O3 reproducer.cu`
4. Run the program with `./a.out`
The error shown above should pop up.
[Notes on the reproducer]
The lines with // works were individually tested to run without errors.
The following error:
was found when running a HeCBench benchmark (bn-cuda), which was compiled with
-O3
flag. This error comes up even though the program does not call theldexpf
function, and it is checked that chipStar does have a ldexpf definition here that uses OpenCLldexp
function. The trace was checked, and it is found that the zeModuleCreate returns ZE_RESULT_SUCCESS even though there was an error in the build log (unresolved external symbol ldexpf
), and because the kernel was failed to be created with ZE_RESULT_ERROR_INVALID_MODULE_UNLINKED, the kernel launch fails during run-time.After digging in, the error seems to be coming from the
powf
call in the program (please see the reproducer at the end of this issue for reference), and the builtin_powf in one of the two powf definitions seems to be causing the error: https://github.com/CHIP-SPV/chipStar/blob/4edbcb68a0a647493a27490c9c87ccaa896dafbc/include/hip/devicelib/single_precision/sp_math.hh#L439-L449 When commented out the__builtin_powf
and forced using the OpenCLpow
function aspowf
definition, the error disappeared. However, if we only callpowf
in a reproducer, no errors are observed, so it seems like the `builtin_powfis not the only source of the error. Also, the program has to be compiled with an optimization flag for the error to show up (tested
-O, -O1, -O2, and -O3`, all of which produce the error).[Reproducer]
Create a reproducer.cu file and paste the following code:
int main(int argc, char** argv) { int N = 1<<20;
kernel<<<(N+255)/256, 256, 256 * sizeof(float)>>>();
printf("done\n"); }