CHIP-SPV / chipStar

chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.
Other
219 stars 32 forks source link

Changes to reduce kernel launch overheads #794

Closed linehill closed 6 months ago

linehill commented 7 months ago

A patch set for reducing kernel launch overheads. These improved HeCBench's mrc, floydwarshall and overlay benchmarks by 41-57% on PVC.

pvelesko commented 6 months ago

Unit test failures for iGPU OpenCL

igpu_opencl_make_check_result.txt: FAIL
    377 - ABM_AddKernel_MultiTypeMultiSize - int (Failed)
    378 - ABM_AddKernel_MultiTypeMultiSize - long (SEGFAULT)
    380 - ABM_AddKernel_MultiTypeMultiSize - long long (Failed)

among others

linehill commented 6 months ago
igpu_opencl_make_check_result.txt: FAIL
  377 - ABM_AddKernel_MultiTypeMultiSize - int (Failed)
  378 - ABM_AddKernel_MultiTypeMultiSize - long (SEGFAULT)
  380 - ABM_AddKernel_MultiTypeMultiSize - long long (Failed)

What’s going on with those tests? They should be failing expectedly.

2024-03-08T03:28:15.1003549Z Name:                          Intel(R) UHD Graphics 730
(...)
2024-03-08T03:28:15.1008640Z maxThreadsPerBlock:            512
2024-03-08T03:28:15.1008861Z maxThreadsDim.x:               512

Is this the iGPU on which the tests fail? The tests should be failing as they try to launch kernels over the supported thread block size (1000 vs. 512):

TEMPLATE_TEST_CASE("ABM_AddKernel_MultiTypeMultiSize", "", int, long, float, long long, double) {
  auto size = GENERATE(as<size_t>{}, 100, 500, 1000);
  // (...)
  hipLaunchKernelGGL(add<TestType>, 1, size, 0, 0, d_a, d_b, d_c, size);
  HIP_CHECK(hipGetLastError());

I see the tests failing on the main branch on my iGPU where thread block sizes limited to 256. If these tests are passing on the main branch on the CI’s iGPU - why are they passing in the first place?

pvelesko commented 6 months ago

@linehill tests are limited to 500 max so they don't fail on main for me not should they be failing. The test in question:

TEMPLATE_TEST_CASE("ABM_AddKernel_MultiTypeMultiSize", "", int, long, float, long long, double) {
  auto size = GENERATE(as<size_t>{}, 100, 500);
  TestType *d_a, *d_b, *d_c;
  auto res = hipMalloc(&d_a, sizeof(TestType) * size);
  REQUIRE(res == hipSuccess);
  res = hipMalloc(&d_b, sizeof(TestType) * size);
  REQUIRE(res == hipSuccess);
  res = hipMalloc(&d_c, sizeof(TestType) * size);
  REQUIRE(res == hipSuccess);

  std::vector<TestType> a, b, c;
  for (size_t i = 0; i < size; i++) {
    a.push_back(i + 1);
    b.push_back(i + 1);
    c.push_back(2 * (i + 1));
  }

  res = hipMemcpy(d_a, a.data(), sizeof(TestType) * size, hipMemcpyHostToDevice);
  REQUIRE(res == hipSuccess);
  res = hipMemcpy(d_b, b.data(), sizeof(TestType) * size, hipMemcpyHostToDevice);
  REQUIRE(res == hipSuccess);

  hipLaunchKernelGGL(add<TestType>, 1, size, 0, 0, d_a, d_b, d_c, size);
  HIP_CHECK(hipGetLastError());

  res = hipMemcpy(a.data(), d_c, sizeof(TestType) * size, hipMemcpyDeviceToHost);
  REQUIRE(res == hipSuccess);

  HIP_CHECK(hipFree(d_a));
  HIP_CHECK(hipFree(d_b));
  HIP_CHECK(hipFree(d_c));
  REQUIRE(a == c);
}

You probably looked in hip-tests which is not enabled by default.

pvelesko commented 6 months ago

iGPU where thread block sizes limited to 256

you just pasted a snippet where it says 512, not 256.

pvelesko commented 6 months ago

Is this the iGPU on which the tests fail?

yes, all tests that fail, fail on the iGPU OpenCL backend.

linehill commented 6 months ago

iGPU where thread block sizes limited to 256

you just pasted a snippet where it says 512, not 256.

512 is figure from CI's test log.

You probably looked in hip-tests which is not enabled by default.

I see, I thought chipStar switched to hip-tests.

linehill commented 6 months ago

No new changes, just rebase.