Open xianghao-wang opened 4 months ago
Thanks for filing this bug report @xianghao-wang!
The root cause of this is that we don't support virtual dispatch on GPU kernels today. Typically virtual dispatch is used for overridden methods to support polymorphism, but it is also used for first-class procedures as you use in your test2
. Virtual calls involve finding a function in a table during execution time based on the class (here introduced by the compiler to wrap the first-class procedure) being used. These calls are compiler primitives at the time when we do GPU transforms, and that specific primitive is not safe for GPU execution because we need a symmetric of that table that contain __device__
versions of functions. In the short term, we can provide a better error message here. The longer term solution is to codegen all the functions in the virtual call table, and make the codegen for the primitive be aware of the GPU's virtual call table.
Meanwhile, for this case, there's a workaround that can be relatively acceptable. It relies on record-wrapped functions which we have been using before first-class procedure implementation was improved significantly:
record incrementer {
type t;
proc this(x: t): t {
return x+1;
}
}
proc test3(ref A: [] real, n: int, f: incrementer(real)) {
@assertOnGpu foreach i in 0..#n {
A[i] = f(A[i]);
}
}
which can be then called with
test3(A, n, new incrementer(real));
Clearly, less than ideal (more coding, abuse of this
maybe?) but hopefully manageable.
I'll also note here that even when we have support for virtual calls from GPU kernels, I expect them to be significantly slower because of the dynamic nature of the call. Function calls are already costlier on the GPU, I expect virtual calls to be even costlier.
Summary of Problem
Description: Calling the procedure passed by paremters make assertOnGpu() fail. In the following codes,
test1()
directly calls procedureincrement()
, whiletest2()
calls it via the procedure parameter. However,assertOnGpu()
intest2()
fails as shown in the following compilation outputs.The output also prints
GPUFunctionCall.chpl:17: note: call to a primitive that is not fast and local
. I was wondering how to make the procedure passed by parameter "fast and local".Steps to Reproduce
Source Code:
Compile command:
chpl --fast GPUFUnctionCall.chpl
Compilation output
Configuration Information
Chapel version
Chapel configuration