Closed linehill closed 6 months ago
Unit test failures for iGPU OpenCL
igpu_opencl_make_check_result.txt: FAIL
377 - ABM_AddKernel_MultiTypeMultiSize - int (Failed)
378 - ABM_AddKernel_MultiTypeMultiSize - long (SEGFAULT)
380 - ABM_AddKernel_MultiTypeMultiSize - long long (Failed)
among others
igpu_opencl_make_check_result.txt: FAIL 377 - ABM_AddKernel_MultiTypeMultiSize - int (Failed) 378 - ABM_AddKernel_MultiTypeMultiSize - long (SEGFAULT) 380 - ABM_AddKernel_MultiTypeMultiSize - long long (Failed)
What’s going on with those tests? They should be failing expectedly.
2024-03-08T03:28:15.1003549Z Name: Intel(R) UHD Graphics 730
(...)
2024-03-08T03:28:15.1008640Z maxThreadsPerBlock: 512
2024-03-08T03:28:15.1008861Z maxThreadsDim.x: 512
Is this the iGPU on which the tests fail? The tests should be failing as they try to launch kernels over the supported thread block size (1000 vs. 512):
TEMPLATE_TEST_CASE("ABM_AddKernel_MultiTypeMultiSize", "", int, long, float, long long, double) {
auto size = GENERATE(as<size_t>{}, 100, 500, 1000);
// (...)
hipLaunchKernelGGL(add<TestType>, 1, size, 0, 0, d_a, d_b, d_c, size);
HIP_CHECK(hipGetLastError());
I see the tests failing on the main branch on my iGPU where thread block sizes limited to 256. If these tests are passing on the main branch on the CI’s iGPU - why are they passing in the first place?
@linehill tests are limited to 500 max so they don't fail on main for me not should they be failing. The test in question:
TEMPLATE_TEST_CASE("ABM_AddKernel_MultiTypeMultiSize", "", int, long, float, long long, double) {
auto size = GENERATE(as<size_t>{}, 100, 500);
TestType *d_a, *d_b, *d_c;
auto res = hipMalloc(&d_a, sizeof(TestType) * size);
REQUIRE(res == hipSuccess);
res = hipMalloc(&d_b, sizeof(TestType) * size);
REQUIRE(res == hipSuccess);
res = hipMalloc(&d_c, sizeof(TestType) * size);
REQUIRE(res == hipSuccess);
std::vector<TestType> a, b, c;
for (size_t i = 0; i < size; i++) {
a.push_back(i + 1);
b.push_back(i + 1);
c.push_back(2 * (i + 1));
}
res = hipMemcpy(d_a, a.data(), sizeof(TestType) * size, hipMemcpyHostToDevice);
REQUIRE(res == hipSuccess);
res = hipMemcpy(d_b, b.data(), sizeof(TestType) * size, hipMemcpyHostToDevice);
REQUIRE(res == hipSuccess);
hipLaunchKernelGGL(add<TestType>, 1, size, 0, 0, d_a, d_b, d_c, size);
HIP_CHECK(hipGetLastError());
res = hipMemcpy(a.data(), d_c, sizeof(TestType) * size, hipMemcpyDeviceToHost);
REQUIRE(res == hipSuccess);
HIP_CHECK(hipFree(d_a));
HIP_CHECK(hipFree(d_b));
HIP_CHECK(hipFree(d_c));
REQUIRE(a == c);
}
You probably looked in hip-tests which is not enabled by default.
iGPU where thread block sizes limited to 256
you just pasted a snippet where it says 512, not 256.
Is this the iGPU on which the tests fail?
yes, all tests that fail, fail on the iGPU OpenCL backend.
iGPU where thread block sizes limited to 256
you just pasted a snippet where it says 512, not 256.
512 is figure from CI's test log.
You probably looked in hip-tests which is not enabled by default.
I see, I thought chipStar switched to hip-tests.
No new changes, just rebase.
A patch set for reducing kernel launch overheads. These improved HeCBench's mrc, floydwarshall and overlay benchmarks by 41-57% on PVC.
__hipPushCallConfiguration()
and__hipPopCallConfiguration()
functions:hipLaunchKernel()
).chipStar::ExecItem
.prepareDeviceVariablesNoLock()
.