CHIP-SPV / chipStar

chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.
Other
158 stars 27 forks source link

hipConstantTestDeviceSymbol fails on Intel CPU #215

Open pjaaskel opened 1 year ago

pjaaskel commented 1 year ago

This test case uses device-side initialized global variables. It produces random output with the Intel CPU driver (#142).

@linehill suspected it's because we lack the clSetKernelExecInfo() calls to set the (generated) global pointers as SVM pointers (CL_KERNEL_EXEC_INFO_SVM_PTRS) which are not referred to in the kernel arg list, but are still used by the kernel.

The problem goes away when I print out the variables in the kernel, which indicates an overly eager optimizer converting the SVM-converted globals to program-scope globals or such, so @linehill might be right. It also works with the GPU driver, possibly just by luck of not optimizing the "SVM globals" away.

__constant__ __device__ int ConstOut = 123;
__constant__ __device__ int ConstIn = 321;

__global__ void Assign(int* Out) {
  int tid = threadIdx.x + blockIdx.x * blockDim.x;
  printf("ConstOut: %d\n", ConstOut); // I added this
  printf("ConstIn: %d\n", ConstIn); // ...and this
  if (tid == 0)
    Out[tid] = -ConstIn;
  printf("Out[tid]: %d\n", Out[tid]);  // ..and this
}
pjaaskel commented 1 year ago

Seems we lack all kind of SVM ptr info passing in the code base?

pvelesko commented 1 year ago

still open?

pjaaskel commented 1 year ago

Does it fail still? Can we enable the test on Intel/PoCL CPU?

pvelesko commented 1 year ago

yes still failing

test 69
    Start 69: hipConstantTestDeviceSymbol

69: Test command: /home/pvelesko/hipstar/hipstar/build/samples/hipSymbol/hipTestConstantDeviceSymbol
69: Test timeout computed to be: 10000000
69: hipTestConstantDeviceSymbol: /home/pvelesko/hipstar/hipstar/samples/hipSymbol/hipTestConstantDeviceSymbol.cpp:46: int main(): Assertion `Ch == -654' failed.
1/1 Test #69: hipConstantTestDeviceSymbol ......Subprocess aborted***Exception:   0.20 sec

0% tests passed, 1 tests failed out of 1

Label Time Summary:
internal    =   0.20 sec*proc (1 test)

Total Test time (real) =   0.57 sec

The following tests FAILED:
     69 - hipConstantTestDeviceSymbol (Subprocess aborted)
pjaaskel commented 1 year ago

Still fails on Intel-CPU. PoCL-CPU works. Do we set the SVM markups for the global initializers correctly @linehill? Might be a driver bug too.

linehill commented 1 year ago

Dropped from Milestone 1.0: resolution depends on external issue.