Port code in the PixelVertexFinding plugin to use OpenMP offload.
What remains to be done:
Many of the loops are in individual target regions, which makes them separate kernels. This is likely to lead to large launch overheads in aggregate. The loop in gpuSplitVertices is different - the outer loops is target teams distribute, and the inner loops are parallel for. The other kernels may need to be modified to do something similar in order to combine multiple loops in a single kernel invocation.
Each kernel has the data movement pulled out of individual loops into a target enter/exit data region. These regions should be be expanded to encompass the all the kernels.
Sorting in gpuSortByPt2 has not been ported.
The atomicInc function, which performs a clipped or bounded increment doesn't have a directly compatible version in OpenMP. The workaround (which I didn't use here), is to perform a bounds check after the increment to guard whatever action is using the variable, and fix up the final value of the variable at the end of the loop.
Port code in the PixelVertexFinding plugin to use OpenMP offload.
What remains to be done:
target
regions, which makes them separate kernels. This is likely to lead to large launch overheads in aggregate. The loop ingpuSplitVertices
is different - the outer loops istarget teams distribute
, and the inner loops areparallel for
. The other kernels may need to be modified to do something similar in order to combine multiple loops in a single kernel invocation.target enter/exit data
region. These regions should be be expanded to encompass the all the kernels.gpuSortByPt2
has not been ported.The
atomicInc
function, which performs a clipped or bounded increment doesn't have a directly compatible version in OpenMP. The workaround (which I didn't use here), is to perform a bounds check after the increment to guard whatever action is using the variable, and fix up the final value of the variable at the end of the loop.