JuliaORNL / JACC.jl

CPU/GPU parallel performance portable layer in Julia via functions as arguments
MIT License
21 stars 13 forks source link

2D (CUDA) parallel_for gets indices wrong if (M,N) not less than or multiples of 16 #57

Open PhilipFackler opened 7 months ago

PhilipFackler commented 7 months ago

mwe_2d.txt

See attached example. When the lengths are less than 16 it works. When they are multiples of 16 it works. Otherwise the i or j provided inside the kernel go past the lengths.