Open mirenradia opened 9 months ago
Unfortunately switching to doctest (#47) does not seem to completely solve this. On b340cd1edaa5ba345b2100a4e3384e8a705e458b, the tests still prematurely abort with the same error code. However, they pass with DEBUG=TRUE
.
Playing around with the optimization flags set in amrex/Tools/GNUMake/comps/dpcpp.make:L35 gives the change from -O1
to -O2
as the culprit (i.e. tests pass with -O1
and prematurely abort with -O2
). I believe -O2
also implies -fvectorize -fslp-vectorize -fsycl-dead-args-optimization
(using this stackoverflow answer) but the tests pass with -O1 -fvectorize -fslp-vectorize -fsycl-dead-args-optimization
and fail with -O2
suggesting it must be something else enabled with -O2
...
Running this through ze_tracer -c --demangle on b340cd1edaa5ba345b2100a4e3384e8a705e458b gives the following Level Zero call that fails:
>>>> [17719443688] zeKernelCreate: hModule = 0x7353da0 desc = 0x7ffee80ad960 {ZE_STRUCTURE_TYPE_KERNEL_DESC(0x1d) 0 0 "_ZTSZZN5amrex11ParallelForILi256EZ17run_ccz4_rhs_testvEUliiiE0_EEvRKNS_3Gpu10KernelInfoERKNS_3BoxEOT0_ENKUlRN4sycl3_V17handlerEE0_clESE_EUlNSC_7nd_itemILi1EEEE_" (amrex::ParallelFor<256, run_ccz4_rhs_test()::{lambda(int, int, int)#2}>(amrex::Gpu::KernelInfo const&, amrex::Box const&, run_ccz4_rhs_test()::{lambda(int, int, int)#2}&&)::{lambda(sycl::_V1::handler&)#2}::operator()(sycl::_V1::handler&) const::{lambda(sycl::_V1::nd_item<1>)#1})} phKernel = 0x7ffee80ad958 (hKernel = 0x1545e1c8f0d0)
<<<< [17719467271] zeKernelCreate [2270 ns] -> ZE_RESULT_ERROR_INVALID_KERNEL_NAME(0x2013265937)
Using the -save-temps
option and inspecting the integration headers and preprocessed *.ii
files shows no difference between -O1
and -O2
other than the short hash.
Enabling shader dumps shows that 10 different *.spv
files (SPIR-V binaries) are created with -O1
but only 3 are created for -O2
so it seems some device kernels are not being compiled in the latter case.
Since there is a new software stack on Dawn (dawn-env/2024-04-15
) with oneAPI 2024.1, I thought I would try this again but I still run into the same problem.
Summary
The unit tests do not work on Intel GPUs. I think there seems to be some incompatibility between Catch2 and the Intel DPC++ compiler's SYCL implementation.
Steps to reproduce
Here are some steps to reproduce on Dawn at its state on 2024-02-26 (likely to change soon):
USE_SYCL=TRUE
:Observed outcome
The tests abort with the following error:
Expected outcome
The tests should work without issues.
Additional information
I think there are several compounding issues here:
The Intel DPC++ compiler has trouble linking unnamed device kernels in Catch2 test cases. See intel/llvm#10659.
Even if I remove all but one of the test cases with
amrex::ParallelFor
s such as the "CCZ4 RHS" test case by modifyingtest_dirs
in Tests/GNUMakefile to just(this test contains multiple
amrex::ParallelFor
s), I get the following errorInterestingly, building with
DEBUG=TRUE
does allow this single test to pass.Intel's Level Zero runtime uses
SIGSEGV
(i.e. a segfault) to trigger migration of memory between host and device when using USM shared allocations. We need to disable Catch2's POSIX signal handling to get around this (note that this is automatically done for AMReX'sSIGSEGV
handling when running on Intel GPU's - see here). It should be sufficient to simply define the macroCATCH_CONFIG_NO_POSIX_SIGNALS
.