Closed etscheelk closed 3 months ago
We recently added support for ref
, const
, and in
intents on CPU-bound foreach
loops; reduce
and var
intents remain as future work.
Definitely, having it assert out rather than give a more friendly error message is an oversight and bug on our part, so apologies for that.
Since, you're looking at it, I'll also mention intent support for GPUs is also a work-in-progress. Currently, on a gpu-bound loop const
intents should work as you expect, in
intents will not, and ref
intents will exhibit the "old behavior" we had before introducing foreach
intents (basically arrays and objects will be passed by reference to the kernel, scalars will be passed by value). This is something we're hoping to address soon.
@etscheelk -- I am curious about the "more-complete" reproducer and wanted to mention some other limitations, some due to inherent challenges with GPUs, some because we haven't prioritized them yet.
System-wide atomics are not supported yet. Supporting them is a bit tricky, but vendors have started to roll out some library support in the recent years. We hope to be able to use them in the future. What this means is:
var x: atomic int; // this is on CPU
on here.gpus[0] {
foreach i in 1..10 with (ref x) {
x.add(i); // this is executing on GPU
}
}
is not supported.
Per-gpu atomics are a bit easier to achieve, but we haven't prioritized them. If your case needs them let us know. What this means is:
on here.gpus[0] {
var x: atomic int; // this is on GPU, now
foreach i in 1..10 with (ref x) {
x.add(i); // this is executing on GPU
}
}
is not supported either.
What we support, however, is more "conventional" means of doing atomics on the GPU:
on here.gpus[0] {
var Arr: [1..n] int; // regular, non-atomic ints, allocated on GPU memory
foreach i in 1..10 {
gpuAtomicAdd(Arr[foo(i)], i);
}
}
this could allow you to do things like histogramming and random-access atomics.
If you have a particular direction that may require any of those idioms let us know. Chapel has a lot of parallel programming features that we want to support on GPUs as well. Hearing from users is always helpful for prioritization.
Here's a PR to patch in a slightly better error: https://github.com/chapel-lang/chapel/pull/24769
Of course, longer term our goal is to add actual support for 'var' intents.
That's exactly what I figured was happening, that var
intent was unimplemented or hypothetical currently. The intention of it could also be a little dubious on the GPU based on block, block size, and however Chapel abstracts this issue.
Originally, I was using a forall
and the with-intent was creating a few task-private variables, such as a random stream for each thread, initialized positions local to each thread, and a reference to a global array which could be accessed randomly.
Regarding the atomics, my particular issue requires the random-access atomics as you pointed out, so thanks for the suggestion. Ultimately they're not too necessary, especially on larger dimensional sizes, as there will infrequently be any collisions and if there are it isn't highly problematic.
The plan is roughly a billion points repeatedly transformed to create a fractal and the grid is a density map, perhaps 8192x8192. Got a few things I still need to figure out:
here.gpus
arrayfill
? I'm also used to the TRNG C++ library for parallel random number generation) (random numbers on gpu are also a little more complicated matter)Project is inspired by this code: https://github.com/pcantrell/density-fractals/tree/main/Source, fractals created with transformations of rectangular and polar coordinates.
Sorry to turn this into stack overflow, I'll likely bring these questions and considerations there.
Sorry to turn this into stack overflow, I'll likely bring these questions and considerations there.
We're happy to help. Our gitter channel is also suitable for more interactive conversations: https://gitter.im/chapel-lang/chapel. But quick answers to your questions in case they can help:
How to visualize this (I'm considering compilation to a library, visualized elsewhere)
You could also look into using a C library and interoperating from Chapel. See C Interoperability
GPU install doesn't recognize a device in the here.gpus array
This one could be a separate issue or a gitter conversation we can help with. Chapel built with CHPL_LOCALE_MODEL=gpu
should be able to handle that. Maybe the runtime was built with the default CHPL_LOCALE_MODEL=flat
.
Parallel random number generation (should I pre-create a stream with fill? I'm also used to the TRNG C++ library for parallel random number generation) (random numbers on gpu are also a little more complicated matter)
In some test codes that we have, we do fillRandom
to prepopulate an array on the host and then copy it to the device. As you alluded to, random number generation on GPU is a complicated matter. An idiomatic way of doing that is something along the lines of:
import Random;
var CpuArr: [1..10] real;
Random.fillRandom(CpuArr);
writeln(CpuArr);
on here.gpus[0] {
var GpuArr = CpuArr;
GpuArr += 1; // this is a kernel launch using random data, for example
writeln(GpuArr);
}
How to visualize this (I'm considering compilation to a library, visualized elsewhere)
As an addendum, @mppf pointed out that https://github.com/chapel-lang/chapel/blob/main/test/exercises/c-ray/Image.chpl has an implementation for PPM and BMP output from a Chapel array. You might want to check it out.
@etscheelk : One other potential for reading/writing images: Quite awhile ago, we had a user write an introduction to Chapel through image processing, which you should be able to access here: http://primachvis.com/html/imgproc_chapel.html It mentions reading/writing PNG files via interoperability with C, so that may be something to leverage.
As that page notes, the significant evolution of the language caused the code used at that time to break, but @Guillaume-Helbecque has recently (and generously) undertaken an effort to modernize it in https://github.com/chapel-lang/chapel/pull/24245
This one could be a separate issue or a gitter conversation we can help with. Chapel built with
CHPL_LOCALE_MODEL=gpu
should be able to handle that. Maybe the runtime was built with the defaultCHPL_LOCALE_MODEL=flat
.
I'll take another stab when I'm back home on my desktop, but I'll take a look at gitter if I continue to have similar problems.
image processing, which you should be able to access here: http://primachvis.com/html/imgproc_chapel.htm
PPM and BMP output from a Chapel array
Thanks for the output suggestions!
foreach with (var)
not implemented yet, but better message prints to user indicating it is not implemented yet #24769
I'll take another stab when I'm back home on my desktop, but I'll take a look at gitter if I continue to have similar problems.
Sounds good, thanks @etscheelk!
Suggestions for image IO keeps coming from my team, so, I'll drop another one here by @mstrout. This one's with PNG: https://github.com/mstrout/ChapelForPythonProgrammersMay2023/tree/main/image_analysis_example
Also, ChapelCon could be a good opportunity to share your work (deadline is next week Friday) or learn more about Chapel and interact with the community. In case you missed it: https://chapel-lang.org/ChapelCon24.html
Summary of Problem
Description:
Variable creation in
foreach with (var y = 1)
leads to compiler error. It seems to be specifically with variable creation within thewith
task intent clause. Other variable inclusions, such asref
,const
,const ref
, etc. seem to compile correctly.I first noticed it when rebuilding for GPU and trying it out, but I checked it again rebuilt without GPU and it still occurs. No error on
forall
. Intentional error oncoforall
that indicates task-private variables not supported forcoforall
,begin
,cobegin
, which makes sense.The following is the error message I received at compile-time:
Is this a blocking issue with no known work-arounds?
yes, seems so
Steps to Reproduce
Use a
foreach
loop with awith-statement
and a task-private variable.Source Code:
The minimum required to cause the issue
Additional more-complete little test
Compile command:
chpl test.chpl
Execution command:
Not applicable, compilation error.
Associated Future Test(s):
I don't know?
Configuration Information
Output of
chpl --version
:Output of
$CHPL_HOME/util/printchplenv --anonymize
:Since it may be applicable, Windows 10 WSL2.
Back-end compiler and version, e.g.
gcc --version
orclang --version
:Ubuntu clang version 15.0.7 Target: x86_64-pc-linux-gnu Thread model: posix InstalledDir: /usr/bin