CHIP-SPV / chipStar

chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.
Other
157 stars 27 forks source link

"Missing definition for ldexpf" and hipErrorTbd run-time errors in benchmark involving __builtin_powf #880

Open jjennychen opened 1 week ago

jjennychen commented 1 week ago

The following error:

CHIP warning [TID 73532] [1719787184.122714146] : Missing definition for 'ldexpf'
CHIP error [TID 73532] [1719787184.191923889] : hipErrorTbd (ZE_RESULT_ERROR_INVALID_MODULE_UNLINKED ) in /home/tsaini.chen/chipStar/src/backend/Level0/CHIPBackendLevel0.cc:2496:compile

CHIP error [TID 73532] [1719787184.192045641] : Caught Error: hipErrorTbd

was found when running a HeCBench benchmark (bn-cuda), which was compiled with -O3 flag. This error comes up even though the program does not call the ldexpf function, and it is checked that chipStar does have a ldexpf definition here that uses OpenCL ldexp function. The trace was checked, and it is found that the zeModuleCreate returns ZE_RESULT_SUCCESS even though there was an error in the build log (unresolved external symbol ldexpf), and because the kernel was failed to be created with ZE_RESULT_ERROR_INVALID_MODULE_UNLINKED, the kernel launch fails during run-time.

17:15:25.983287715 - chiatta00 - vpid: 125926, vtid: 125926 - lttng_ust_ze:zeModuleCreate_entry: { hContext: 0x00005628e97d0be0, hDevice: 0x00005628e97e3b50, desc: 0x00007fff9017adf0, phModule: 0x00007fff9017afd8, phBuildLog: 0x00007fff9017aeb8, desc_val: { stype: ZE_STRUCTURE_TYPE_MODULE_DESC, pNext: 0x00007fff9017ae40, format: ZE_MODULE_FORMAT_IL_SPIRV, inputSize: 0, pInputModule: 0x0000000000000000, pBuildFlags: 0x0000000000000000, pConstants: 0x0000000000000000 }, desc__pBuildFlags_val: "(null)", desc__pConstants_val: , desc__pConstants__pConstantIds_vals: [], desc__pConstants__pConstantValues_vals: [  ] }
17:15:25.983350717 - chiatta00 - vpid: 125926, vtid: 125949 - lttng_ust_ze:zeEventQueryStatus_entry: { hEvent: 0x00005628e980f3e0 }
17:15:25.983350911 - chiatta00 - vpid: 125926, vtid: 125949 - lttng_ust_ze:zeEventQueryStatus_exit: { zeResult: ZE_RESULT_SUCCESS }
17:15:25.983351729 - chiatta00 - vpid: 125926, vtid: 125949 - lttng_ust_ze:zeEventQueryStatus_entry: { hEvent: 0x00005628ea61a3b0 }
17:15:25.983351994 - chiatta00 - vpid: 125926, vtid: 125949 - lttng_ust_ze:zeEventQueryStatus_exit: { zeResult: ZE_RESULT_SUCCESS }
17:15:25.983607311 - chiatta00 - vpid: 125926, vtid: 125949 - lttng_ust_ze:zeEventQueryStatus_entry: { hEvent: 0x00005628e980f730 }
17:15:25.983607499 - chiatta00 - vpid: 125926, vtid: 125949 - lttng_ust_ze:zeEventQueryStatus_exit: { zeResult: ZE_RESULT_SUCCESS }
17:15:26.539978394 - chiatta00 - vpid: 125926, vtid: 125926 - lttng_ust_ze_build:log: { buildLog: "error : unresolved external symbol ldexpf at offset 1036 in instructions segment #1 (aka kernel : _Z13computeKerneliiPKfPKbiiPfPi)\nerror : unresolved external symbol ldexpf at offset 1076 in instructions segment #1 (aka kernel : _Z13computeKerneliiPKfPKbiiPfPi)\n" }
17:15:26.539981951 - chiatta00 - vpid: 125926, vtid: 125926 - lttng_ust_ze:zeModuleCreate_exit: { zeResult: ZE_RESULT_SUCCESS, phModule_val: 0x00005628e9cd0460, phBuildLog_val: 0x00005628e99e8a50 }
17:15:26.539983380 - chiatta00 - vpid: 125926, vtid: 125926 - lttng_ust_ze:zeModuleGetKernelNames_entry: { hModule: 0x00005628e9cd0460, pCount: 0x00007fff9017afd8, pNames: 0x0000000000000000, pCount_val: 0 }
17:15:26.539983859 - chiatta00 - vpid: 125926, vtid: 125926 - lttng_ust_ze:zeModuleGetKernelNames_exit: { zeResult: ZE_RESULT_SUCCESS, pCount_val: 3, pNames_vals: [  ] }
17:15:26.539984107 - chiatta00 - vpid: 125926, vtid: 125926 - lttng_ust_ze:zeModuleGetKernelNames_entry: { hModule: 0x00005628e9cd0460, pCount: 0x00007fff9017afd8, pNames: 0x00007fff9017ac50, pCount_val: 3 }
17:15:26.539984519 - chiatta00 - vpid: 125926, vtid: 125926 - lttng_ust_ze:zeModuleGetKernelNames_exit: { zeResult: ZE_RESULT_SUCCESS, pCount_val: 3, pNames_vals: [ 0x00005628ea6e7dd0, 0x00005628e99fbaa0, 0x00005628ea26a610 ] }
17:15:26.539987746 - chiatta00 - vpid: 125926, vtid: 125926 - lttng_ust_ze:zeKernelCreate_entry: { hModule: 0x00005628e9cd0460, desc: 0x00007fff9017af50, phKernel: 0x00007fff9017aeb8, desc_val: { stype: ZE_STRUCTURE_TYPE_KERNEL_DESC, pNext: 0x0000000000000000, flags: [  ], pKernelName: 0x00005628e9d4f5c0 }, desc__pKernelName_val: "_Z14genScoreKerneliPfPKiPKf" }
17:15:26.539988726 - chiatta00 - vpid: 125926, vtid: 125926 - lttng_ust_ze:zeKernelCreate_exit: { zeResult: ZE_RESULT_ERROR_INVALID_MODULE_UNLINKED, phKernel_val: 0x00005628e99e8a50 }

After digging in, the error seems to be coming from the powf call in the program (please see the reproducer at the end of this issue for reference), and the builtin_powf in one of the two powf definitions seems to be causing the error: https://github.com/CHIP-SPV/chipStar/blob/4edbcb68a0a647493a27490c9c87ccaa896dafbc/include/hip/devicelib/single_precision/sp_math.hh#L439-L449 When commented out the __builtin_powf and forced using the OpenCL pow function as powf definition, the error disappeared. However, if we only call powf in a reproducer, no errors are observed, so it seems like the `builtin_powfis not the only source of the error. Also, the program has to be compiled with an optimization flag for the error to show up (tested-O, -O1, -O2, and -O3`, all of which produce the error).

[Reproducer]

  1. Clone and build chipStar
  2. Create a reproducer.cu file and paste the following code:

    
    __global__ void kernel() {
    float lsinblock[10000] = { 0 };
    int t = 0;
    //int a = 0; // used for following testing
    
    for (int i=0; i<10; i++) {
    t = (int)lsinblock[(int)powf(2.0, i)+t]; // error
    //powf(2.0, i); // works
    //a = (int)powf(2.0, i); // works
    //a = (int)lsinblock[(int)powf(2.0,i)+t]; // works
    //a = (int)lsinblock[(int)powf(2.0,i)+0]; // works
    //t = (int)lsinblock[(int)powf(2.0,i)+0]; // works
    //t = (int)lsinblock[(int)powf(2.0,i)+5]; // works
    }
    }

int main(int argc, char** argv) { int N = 1<<20;

kernel<<<(N+255)/256, 256, 256 * sizeof(float)>>>();

printf("done\n"); }


3. Compile the code with `nvcc -O3 reproducer.cu`
4. Run the program with `./a.out`

The error shown above should pop up.

[Notes on the reproducer]
The lines with // works were individually tested to run without errors.