Closed milthorpe closed 11 months ago
Thanks for taking the time to file this, Josh!
I am trying to figure out what the right course of action is here. With the GPU locale model, I expect such reductions to work if the forall
executes on the host. So, a compiler error might be too big of a hammer. We could consider issuing a compiler warning and an execution time error if the loop ends up running in a GPU sublocale. Would that be too subtle, or potentially frustrating if you ignore the compiler warning only to see an execution time error?
Or would we want to always run forall loops with reduce intents on the CPU until such time as they are working properly on GPUs?
Or would we want to always run forall loops with reduce intents on the CPU until such time as they are working properly on GPUs?
Feels like an obvious thing to do that I apparently couldn't think of..
I didn't either until you reminded me of GPU code falling back to the CPU when not possible to GPUize on another issue yesterday.
Or would we want to always run forall loops with reduce intents on the CPU until such time as they are working properly on GPUs?
I've merged a quick PR that changes things so that this is now what we'll do: https://github.com/chapel-lang/chapel/pull/23931
Of course long term we'll want to support this on the gpu.
Summary of Problem
As documented in #23324, reduce intents are not currently supported for GPU loops, however, rather than generating a compile error, they generate code which works incorrectly. In the example below, a sum reduction returns a result of 0 instead of the expected result.
Steps to Reproduce
Source Code: gpusum.chpl
Compile command:
for CPU:
chpl gpusum.chpl
for GPU on NVIDIA:CHPL_LOCALE_MODEL=gpu CHPL_GPU=nvidia chpl gpusum.chpl
Execution command:
./gpusum
On CPU, this prints "45"; on GPU, it prints "0".
Configuration Information
chpl --version
:$CHPL_HOME/util/printchplenv --anonymize
:or