[Bug]: internal error: RES-LOW-ORS-1637 chpl version 2.0.0, foreach and task-private variable in with-statement

etscheelk commented 3 months ago

Summary of Problem

Description:

Variable creation in foreach with (var y = 1) leads to compiler error. It seems to be specifically with variable creation within the with task intent clause. Other variable inclusions, such as ref, const, const ref, etc. seem to compile correctly.

I first noticed it when rebuilding for GPU and trying it out, but I checked it again rebuilt without GPU and it still occurs. No error on forall. Intentional error on coforall that indicates task-private variables not supported for coforall, begin, cobegin, which makes sense.

The following is the error message I received at compile-time:

internal error: RES-LOW-ORS-1637 chpl version 2.0.0

Internal errors indicate a bug in the Chapel compiler, and we're sorry for the hassle. We would appreciate your reporting this bug -- please see https://chapel-lang.org/bugs.html for instructions.

Is this a blocking issue with no known work-arounds?

yes, seems so

Steps to Reproduce

Use a foreach loop with a with-statement and a task-private variable.

Source Code:

The minimum required to cause the issue

foreach i in 0..#1 
with (
    var y = 1
)
{

}

Additional more-complete little test

// This is a distillation of something else I was writing
config const len = 100_000;

var x : atomic int = 0;

foreach i in 0..#len 
with (
    ref x,
    var y = 1
)
{
    x.add(y);
}

writeln(x.read());

Compile command:

chpl test.chpl

Execution command:

Not applicable, compilation error.

Associated Future Test(s):

I don't know?

Configuration Information

Output of chpl --version:

chpl version 2.0.0
built with LLVM version 15.0.7
available LLVM targets: m68k, xcore, x86-64, x86, wasm64, wasm32, ve, systemz, sparcel, sparcv9, sparc, riscv64, riscv32, ppc64le, ppc64, ppc32le, ppc32, nvptx64, nvptx, msp430, mips64el, mips64, mipsel, mips, lanai, hexagon, bpfeb, bpfel, bpf, avr, thumbeb, thumb, armeb, arm, amdgcn, r600, aarch64_32, aarch64_be, aarch64, arm64_32, arm64
Copyright 2020-2024 Hewlett Packard Enterprise Development LP
Copyright 2004-2019 Cray Inc.
(See LICENSE file for more details)

Output of $CHPL_HOME/util/printchplenv --anonymize:

CHPL_TARGET_PLATFORM: linux64
CHPL_TARGET_COMPILER: llvm
CHPL_TARGET_ARCH: x86_64
CHPL_TARGET_CPU: native
CHPL_LOCALE_MODEL: flat
CHPL_COMM: none
CHPL_TASKS: qthreads
CHPL_LAUNCHER: none
CHPL_TIMERS: generic
CHPL_UNWIND: none
CHPL_MEM: jemalloc
CHPL_ATOMICS: cstdlib
CHPL_GMP: bundled
CHPL_HWLOC: bundled
CHPL_RE2: bundled
CHPL_LLVM: system *
CHPL_AUX_FILESYS: none

Since it may be applicable, Windows 10 WSL2.

Back-end compiler and version, e.g. gcc --version or clang --version:


gcc (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Ubuntu clang version 15.0.7 Target: x86_64-pc-linux-gnu Thread model: posix InstalledDir: /usr/bin



- (For Cray systems only) Output of `module list`: N/A

stonea commented 3 months ago

We recently added support for ref, const, and in intents on CPU-bound foreach loops; reduce and var intents remain as future work.

Definitely, having it assert out rather than give a more friendly error message is an oversight and bug on our part, so apologies for that.

Since, you're looking at it, I'll also mention intent support for GPUs is also a work-in-progress. Currently, on a gpu-bound loop const intents should work as you expect, in intents will not, and ref intents will exhibit the "old behavior" we had before introducing foreach intents (basically arrays and objects will be passed by reference to the kernel, scalars will be passed by value). This is something we're hoping to address soon.

e-kayrakli commented 3 months ago

@etscheelk -- I am curious about the "more-complete" reproducer and wanted to mention some other limitations, some due to inherent challenges with GPUs, some because we haven't prioritized them yet.

System-wide atomics are not supported yet. Supporting them is a bit tricky, but vendors have started to roll out some library support in the recent years. We hope to be able to use them in the future. What this means is:

var x: atomic int; // this is on CPU

on here.gpus[0] {
  foreach i in 1..10 with (ref x) {
    x.add(i);  // this is executing on GPU
  }
}

is not supported.

Per-gpu atomics are a bit easier to achieve, but we haven't prioritized them. If your case needs them let us know. What this means is:

on here.gpus[0] {
  var x: atomic int; // this is on GPU, now
  foreach i in 1..10 with (ref x) {
    x.add(i);  // this is executing on GPU
  }
}

is not supported either.

What we support, however, is more "conventional" means of doing atomics on the GPU:

on here.gpus[0] {
  var Arr: [1..n] int;  // regular, non-atomic ints, allocated on GPU memory
  foreach i in 1..10 {
    gpuAtomicAdd(Arr[foo(i)], i);
  }
}

this could allow you to do things like histogramming and random-access atomics.

If you have a particular direction that may require any of those idioms let us know. Chapel has a lot of parallel programming features that we want to support on GPUs as well. Hearing from users is always helpful for prioritization.

stonea commented 3 months ago

Here's a PR to patch in a slightly better error: https://github.com/chapel-lang/chapel/pull/24769

Of course, longer term our goal is to add actual support for 'var' intents.

etscheelk commented 3 months ago

That's exactly what I figured was happening, that var intent was unimplemented or hypothetical currently. The intention of it could also be a little dubious on the GPU based on block, block size, and however Chapel abstracts this issue.

Originally, I was using a forall and the with-intent was creating a few task-private variables, such as a random stream for each thread, initialized positions local to each thread, and a reference to a global array which could be accessed randomly.

Regarding the atomics, my particular issue requires the random-access atomics as you pointed out, so thanks for the suggestion. Ultimately they're not too necessary, especially on larger dimensional sizes, as there will infrequently be any collisions and if there are it isn't highly problematic.

The plan is roughly a billion points repeatedly transformed to create a fractal and the grid is a density map, perhaps 8192x8192. Got a few things I still need to figure out:

How to visualize this (I'm considering compilation to a library, visualized elsewhere)
GPU install doesn't recognize a device in the here.gpus array
Parallel random number generation (should I pre-create a stream with fill? I'm also used to the TRNG C++ library for parallel random number generation) (random numbers on gpu are also a little more complicated matter)

Project is inspired by this code: https://github.com/pcantrell/density-fractals/tree/main/Source, fractals created with transformations of rectangular and polar coordinates.

Sorry to turn this into stack overflow, I'll likely bring these questions and considerations there.

e-kayrakli commented 3 months ago

Sorry to turn this into stack overflow, I'll likely bring these questions and considerations there.

We're happy to help. Our gitter channel is also suitable for more interactive conversations: https://gitter.im/chapel-lang/chapel. But quick answers to your questions in case they can help:

How to visualize this (I'm considering compilation to a library, visualized elsewhere)

You could also look into using a C library and interoperating from Chapel. See C Interoperability

GPU install doesn't recognize a device in the here.gpus array

This one could be a separate issue or a gitter conversation we can help with. Chapel built with CHPL_LOCALE_MODEL=gpu should be able to handle that. Maybe the runtime was built with the default CHPL_LOCALE_MODEL=flat.

Parallel random number generation (should I pre-create a stream with fill? I'm also used to the TRNG C++ library for parallel random number generation) (random numbers on gpu are also a little more complicated matter)

In some test codes that we have, we do fillRandom to prepopulate an array on the host and then copy it to the device. As you alluded to, random number generation on GPU is a complicated matter. An idiomatic way of doing that is something along the lines of:

import Random;
var CpuArr: [1..10] real;

Random.fillRandom(CpuArr);

writeln(CpuArr);

on here.gpus[0] {
  var GpuArr = CpuArr;
  GpuArr += 1; // this is a kernel launch using random data, for example

  writeln(GpuArr);
}

e-kayrakli commented 3 months ago

How to visualize this (I'm considering compilation to a library, visualized elsewhere)

As an addendum, @mppf pointed out that https://github.com/chapel-lang/chapel/blob/main/test/exercises/c-ray/Image.chpl has an implementation for PPM and BMP output from a Chapel array. You might want to check it out.

bradcray commented 3 months ago

@etscheelk : One other potential for reading/writing images: Quite awhile ago, we had a user write an introduction to Chapel through image processing, which you should be able to access here: http://primachvis.com/html/imgproc_chapel.html It mentions reading/writing PNG files via interoperability with C, so that may be something to leverage.

As that page notes, the significant evolution of the language caused the code used at that time to break, but @Guillaume-Helbecque has recently (and generously) undertaken an effort to modernize it in https://github.com/chapel-lang/chapel/pull/24245

etscheelk commented 3 months ago

This one could be a separate issue or a gitter conversation we can help with. Chapel built with CHPL_LOCALE_MODEL=gpu should be able to handle that. Maybe the runtime was built with the default CHPL_LOCALE_MODEL=flat.

I'll take another stab when I'm back home on my desktop, but I'll take a look at gitter if I continue to have similar problems.

image processing, which you should be able to access here: http://primachvis.com/html/imgproc_chapel.htm

PPM and BMP output from a Chapel array

Thanks for the output suggestions!

etscheelk commented 3 months ago

foreach with (var) not implemented yet, but better message prints to user indicating it is not implemented yet #24769

e-kayrakli commented 3 months ago

I'll take another stab when I'm back home on my desktop, but I'll take a look at gitter if I continue to have similar problems.

Sounds good, thanks @etscheelk!

Suggestions for image IO keeps coming from my team, so, I'll drop another one here by @mstrout. This one's with PNG: https://github.com/mstrout/ChapelForPythonProgrammersMay2023/tree/main/image_analysis_example

Also, ChapelCon could be a good opportunity to share your work (deadline is next week Friday) or learn more about Chapel and interact with the community. In case you missed it: https://chapel-lang.org/ChapelCon24.html

chapel-lang / chapel