Work around segfaults on Intel GPUs

I encountered segfaults within the OneAPI runtime when trying to run Octo-Tiger on an Intel GPU Max 1100. This seems to happen when we call too many kernels asynchronously (or in parallel) when the first kernel is not yet finished (which is basically normal behavior for Octo-Tiger as it launches a massive amount of compute kernels). My best guess is that something gets initialized within the runtime and needs to be done by the time more kernels are being called.

An easy workaround is to simply call some empty (and synchronous) dummy kernels right at the beginning of Octo-Tiger. Curiously, this is required once per library (hydrolib and octolib) -- however, this workaround resolves the issue entirely, and we can finally run Octo-Tiger properly on the Intel GPUs! Similar workarounds might be required for other HPX applications trying to use OneAPI though (i.e. have one synchronous kernel at the start).

STEllAR-GROUP / octotiger

Work around segfaults on Intel GPUs #486