Closed tkoskela closed 2 years ago
The code currently uses a global size VOLUME/2
and a local size 128
. These are passed to the CUDA kernels. The SYCL code generated by dpct mimics this behaviour by passing a local and a global size to the nd_range
. The simplest approach is to just reduce the local and global ranges to 1D. It could be interesting to just pass simple 1D ranges to the lambda functions in sycl.
Investigate
nd_range<3>
andnd_item<3>
produced bydpct
. Can these be reduced to 1 dimension?