Flattening direct kernel code gen for cuda

OP-DSL / OP2-Common

OP2: open-source framework for the execution of unstructured grid applications on clusters of GPUs or multi-core CPUs

https://op-dsl.github.io

Other

98 stars 46 forks source link

Flattening direct kernel code gen for cuda #246

Closed TobyFlynn closed 1 year ago

TobyFlynn commented 1 year ago

Instead of launching 200 blocks with a for loop in the cuda kernel for direct kernels, this will now launch 1 thread per element in the set if there are no reductions and will launch 400 blocks if there is a reduction in the direct kernel.