This is meant to be mostly an overview of what can be done using advanced features, more than going into the details.
No base material available.
Custom mappers ( #1 )
mapping of a class containing pointers
Custom memory allocators ( #4 )
Memory allocations allocation per thread ( firstprivate) vs shared variables ( may live in shared memory/global memory/local memory vs being passed as a kernel argument )
Pinned memory allocation on CPU
shared memory allocations
Concurrency
submit kernels from multiple threads (#3). Demonstrates using different cudaStreams/hipStreams.
use openmp asynchronously ( stretch ) (#7). Demonstrates overlapping memory transfers and execution or multiple small kernels which might be difficult to merge in one bigger kernel.
Interoperability ( #2 )
Dementrate how to use cuda with a variable mapped from openmp and how to use a variable allocated from cuda in openmp.
example of using cuFFT ( or any cuda/rocm numerical library ) together with openmp
This is meant to be mostly an overview of what can be done using advanced features, more than going into the details. No base material available.