chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.77k stars 417 forks source link

[Bug]: multiple kernels with a 2D domain on remote variables results in internal error #25665

Open jabraham17 opened 1 month ago

jabraham17 commented 1 month ago

Summary of Problem

The following code produces the error "gpu-nvidia.c:292: Error calling CUDA function: an illegal memory access was encountered".

const D = {0..<10, 0..<10};
on here.gpus[0] var A: [D] bool;
on here.gpus[0] var B: [D] bool;
on here.gpus[0] {
  const DD = D; // localize domain
  forall idx in DD do B = A[idx];
  var neq: [DD] bool;
  foreach idx in DD do neq[idx] = A[idx] != B[idx];
}

There are two kernels in this code, the forall and the foreach. Commenting out one or the other results makes the error go away. Also note that D is a 2D domain, if its 1D then the error does not occur. Lastly, changing the declaration of A and B to be declared inside the on block (instead of being remote variable declarations) makes the error go away.

Configuration Information

e-kayrakli commented 1 month ago

Some other data points/thoughts:


General info on passing arrays as a whole (the array record) to kernels:

jabraham17 commented 1 month ago

On newer CUDAs, I actually see misaligned address, which is much harder to debug. I wonder if we should try to debug this on an older CUDA with cuda-gdb to understand what's wrong.

Just noting that I saw this as well. Sometimes the runs would be "illegal memory access" and sometimes it was "misaligned address"

Iainmon commented 1 month ago

Are N dim domains still only parallel over the first dimension on GPUs?

e-kayrakli commented 1 month ago

Yes. See https://github.com/chapel-lang/chapel/issues/22152 and https://github.com/chapel-lang/chapel/issues/24331

e-kayrakli commented 3 weeks ago

This might get lost in a previous comment I made, but based on your recollection (not asking you to rerun anything) @jabraham17 would it be correct to say that using foreach for both loops is the acceptable workaround for the scenario in the OP?

jabraham17 commented 3 weeks ago

This might get lost in a previous comment I made, but based on your recollection (not asking you to rerun anything) @jabraham17 would it be correct to say that using foreach for both loops is the acceptable workaround for the scenario in the OP?

Yes, using only foreach for both loops is a good workaround for this issue