Open jabraham17 opened 3 months ago
Some other data points/thoughts:
foreach
also worksforall
-as-the-first-loop and foreach
-as-the-first-loop, there's an LICM difference in the second loop. Where the former causes A
and B
to be not LICM'ed leaving metadata inside the kernel. In the latter case, all we have is ddata
s, and that works fineA
and B
as remote declared variables vs local ones in scope is also impacting LICM, rather than being the root cause of the issue itself. IOW, the fact that remote-declared-ness of these variables only impact things because of different AST structure.misaligned address
, which is much harder to debug. I wonder if we should try to debug this on an older CUDA with cuda-gdb to understand what's wrong.General info on passing arrays as a whole (the array record) to kernels:
On newer CUDAs, I actually see misaligned address, which is much harder to debug. I wonder if we should try to debug this on an older CUDA with cuda-gdb to understand what's wrong.
Just noting that I saw this as well. Sometimes the runs would be "illegal memory access" and sometimes it was "misaligned address"
Are N dim domains still only parallel over the first dimension on GPUs?
This might get lost in a previous comment I made, but based on your recollection (not asking you to rerun anything) @jabraham17 would it be correct to say that using foreach
for both loops is the acceptable workaround for the scenario in the OP?
This might get lost in a previous comment I made, but based on your recollection (not asking you to rerun anything) @jabraham17 would it be correct to say that using foreach for both loops is the acceptable workaround for the scenario in the OP?
Yes, using only foreach
for both loops is a good workaround for this issue
Summary of Problem
The following code produces the error "gpu-nvidia.c:292: Error calling CUDA function: an illegal memory access was encountered".
There are two kernels in this code, the forall and the foreach. Commenting out one or the other results makes the error go away. Also note that
D
is a 2D domain, if its 1D then the error does not occur. Lastly, changing the declaration ofA
andB
to be declared inside the on block (instead of being remote variable declarations) makes the error go away.Configuration Information
chpl --version
: 2.2.0 pre-release$CHPL_HOME/util/printchplenv --anonymize
:gcc --version
orclang --version
: LLVM 18