Closed krasznaa closed 9 months ago
Unfortunately additional testing on a discreet Intel GPU with both its Level-0 and OpenCL drivers is producing a whole bunch of failures from the tests. 😦 For some reason vecmem::sycl::async_copy
is just not working correctly with that backend. 😕
Continuing with the debugging for now...
Turns out, it was our code that was working incorrectly all along. 😦 Had to introduce a bunch more explicit synchronization points to make vecmem::sycl::async_copy
work reliably. For things that SYCL can do asynchronously on an Intel backend, but not on NVIDIA and AMD ones. 🤔
In the process I also found one more type of issue with our current tests on Intel GPUs. But that will be fixed in a separate PR.
While debugging the issue that lead to #264, it became clear that the unit testing of the different
vecmem::copy
implementations was far from perfect in the project. So I set out to re-write the whole thing.Previously only
vecmem::copy
was tested to a reasonable level, as part ofvecmem_test_core
. Now I turned all of those same tests, with some modifications, into "parametized" tests. Such that they could be executed conveniently on the rest of the "copy implementations" (vecmem::cuda::copy
,vecmem::sycl::async_copy
, etc.) as well, with various "memory type" combinations.This turned out to be fairly useful. It highlighted yet another issue in our code, which I now fixed in
copy.ipp
.Note that
test_cuda_copy.cpp
came out a bit more complicated than the rest of them. :frowning: Unfortunately one cannot declare a "global" variable like:At the end of the application this
foo
object would be destructed after the CUDA runtime library would've already "shut itself off". Leading to a crash/exception at the very end of the test. To avoid this, I went with:https://google.github.io/googletest/advanced.html#global-set-up-and-tear-down
Since for parametrized tests one can't easily put the used objects into a scope. :thinking:
Note that in future PRs this technique should allow us to extend for instance tests/cuda/test_cuda_edm_copy.cpp to exercise
vecmem::cuda::async_copy
as well. :thinking: (I didn't want to make this PR even larger with that change...)