Open stonea opened 1 year ago
This relates to #22176 and #22177 when trying to implement the common GPU pattern of an intra-thread loop with a by
clause of gridDim*blockDim
.
for (int x = threadIdx.x + blockDim.x*blockIdx.x; x < foo; x+= gridDim.x*blockDim.x) {
//Do something with contiguous indices in the same warp
}
Important for keeping the threads in the warp working on contiguous indices of the iteration space for performance.
While #25855 adds an initial implementation of this feature, I am keeping this issue open because the comment https://github.com/chapel-lang/chapel/pull/25855#issuecomment-2327527618 requests a variant of this feature where the selection of the loop iterations to be executed by a given thread is cyclic / round-robin, rather than sequential, as in #25855. We also need to implement this feature for reduction kernels.
Thanks for working on this, @vasslitvinov !
Oops, I did not mean to close it, as we still need to implement it for reduce kernels. Reopening.
Tagging @stonea as we have been discussing var
intent support for GPU kernels. The reason is, to support reductions alongside itersPerThread
we need to have a way to create per-thread variables in the generated GPU kernel. And I think that requirement is common for both efforts.
I don't think that is a big effort, but I wanted both of you to be aware of the related work to avoid stepping on toes / repeating effort etc. As of today, neither effort is scheduled for the near term, but would be good to get done in a release or two.
So far, given a GPUizable loop like:
when this is run there will be a one-to-one mapping between loop iterations and GPU threads.
There may be various reasons for users to not one a one-to-one mapping; for example, the iteration space may exceed the maximum allowed number of threads. Also see: https://github.com/chapel-lang/chapel/issues/22152#issuecomment-1525828257
To change the mapping users could rewrite their loops to use an inner
for
, say something like:But maybe it would nicer if we had some language feature to do that. For example: