Open lucaparisi91 opened 4 months ago
NVidia GPUs can have vectorized instructions of length 2 or 4 . This is different from SIMT parallelism. The instruction will operate on 2(4) x 32 threads in a warp .
It looks like some old versions of CCE interprets simd
as simt
, and the value is required to use all threads in a warp.
That is not the case for new versions of cce . At the moment most compilers seem to ignore the simd directive.
What does #pragma omp simd actually do on the GPU ?