An example: A 2D Convolutional filter applied to a matrix. The values of filter-matrix were initially kept in constant memory at the first commit. But due to Gitlab pipeline error "The SYCL backend does not support global device constants"; in the second commit, constant memory usage has been removed.
Kernel1: Global memory is used, without tiling.
Kernel2: Uses tiling. Block size is assumed to be equal to tile size. First, the tile is copied to shared memory, since an element in the tile would be accessed many times. Each block works on the domain of one tile. But at the border of the tile, some external matrix values are needed ( at the border with another tile) then those matrix values are taken from the global memory.
An example: A 2D Convolutional filter applied to a matrix. The values of filter-matrix were initially kept in constant memory at the first commit. But due to Gitlab pipeline error "The SYCL backend does not support global device constants"; in the second commit, constant memory usage has been removed.