Closed jeffhammond closed 3 years ago
This fixes performance differences between CUDA and DPC++ due to a default block size of 32.
This fixes performance differences between CUDA and DPC++ due to a default block size of 32.