chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.79k stars 421 forks source link

forall reductions fail silently for GPU #23827

Closed milthorpe closed 11 months ago

milthorpe commented 1 year ago

Summary of Problem

As documented in #23324, reduce intents are not currently supported for GPU loops, however, rather than generating a compile error, they generate code which works incorrectly. In the example below, a sum reduction returns a result of 0 instead of the expected result.

Steps to Reproduce

Source Code: gpusum.chpl

use GPU;

proc computeSum(loc) {
    on loc {
        var sum: int = 0;
        forall a in 0..#10 with (+ reduce sum) {
            sum += a;
        }
        writeln(sum);
    }
}

if here.gpus.size > 0 then computeSum(here.gpus[0]);
else computeSum(here);

Compile command:

for CPU: chpl gpusum.chpl for GPU on NVIDIA: CHPL_LOCALE_MODEL=gpu CHPL_GPU=nvidia chpl gpusum.chpl

Execution command:

./gpusum

On CPU, this prints "45"; on GPU, it prints "0".

Configuration Information

bradcray commented 12 months ago

Thanks for taking the time to file this, Josh!

e-kayrakli commented 11 months ago

I am trying to figure out what the right course of action is here. With the GPU locale model, I expect such reductions to work if the forall executes on the host. So, a compiler error might be too big of a hammer. We could consider issuing a compiler warning and an execution time error if the loop ends up running in a GPU sublocale. Would that be too subtle, or potentially frustrating if you ignore the compiler warning only to see an execution time error?

bradcray commented 11 months ago

Or would we want to always run forall loops with reduce intents on the CPU until such time as they are working properly on GPUs?

e-kayrakli commented 11 months ago

Or would we want to always run forall loops with reduce intents on the CPU until such time as they are working properly on GPUs?

Feels like an obvious thing to do that I apparently couldn't think of..

bradcray commented 11 months ago

I didn't either until you reminded me of GPU code falling back to the CPU when not possible to GPUize on another issue yesterday.

stonea commented 11 months ago

Or would we want to always run forall loops with reduce intents on the CPU until such time as they are working properly on GPUs?

I've merged a quick PR that changes things so that this is now what we'll do: https://github.com/chapel-lang/chapel/pull/23931

Of course long term we'll want to support this on the gpu.