Open gmlueck opened 11 months ago
I agree that implementations should not be required to support this. The wording in the spec seems too narrow and all members should be treated as kernel arguments. I believe this would be more comparable to the lambda-as-kernel world, where I would compare a usage as in the example to an explicit capture of a variable.
In fact, DPC++ does diagnose an error for the code snippet above. My understanding is that AdaptiveCpp passes the entire object as a kernel parameter in cases like this, so I presume the code above would also be illegal in AdaptiveCpp, though I have not checked.
Generally, AdaptiveCpp is very laid back with diagnosing errors which is partly a result of supporting integration with other toolchains where we don't want to accidentally restrict functionality that would otherwise natively work, and partly because AdaptiveCpp has perhaps a more research-y background where experimentation and flexibility is central.
The precise behavior of what happens with this example depends on the compilation flow.
On the OpenMP host backend, it would work, just as in regular C++ code. That backend supports almost any C++ code as an extension.
For the clang CUDA/HIP compilation flows, we indeed pass the entire object as a parameter. That would by itself not trigger an error, however, the destructor of std::string
ending up in device code may cause either compilation failure, or potentially be UB if the free()
happens to compile by virtue of CUDA having some support for device-side free which then gets invoked with a host pointer.
In the SSCP compiler, the kernel lambda/function object is passed into the kernel by decomposing it into POD members which are then reassembled as the struct in device code. If an argument is unused, the compiler might optimize it away. The problem probably again would be the std::string
destructor. If that is not optimized away and remains in device code, I would expect JIT failure due to unsupported host functionality (like free()
) being invoked.
So tl;dr: Supported as an extension in the OpenMP host backend (either with compiler acceleration or as library-only), as soon as offload backends come into play problems are indeed expected.
Thanks, @illuhad. It seems like we are in agreement that the spec should be clarified to say that all member variables must have a type that is a valid kernel argument. If AdaptiveCpp allows other types on some backends, I think this would be an extension in your implementation.
I agree this is better.
I think we also have to clarify that the object doesn't have any static
member variables. We talk about non-static
member variables further down, but I don't think we ever say that static
member variables are not (required to be) supported. (That's probably too simple as well, as we can somewhat support static constexpr
member variables.)
Just to be clear, if people still want to have members who are not legal device type, they need to store raw pointer to them. AKA:
#include <sycl/sycl.hpp>
#include <iostream>
#include <sstream>
class MyKernel {
public:
MyKernel(sycl::queue q) {
prefix = new std::string("The answer:");
answer = sycl::malloc_shared<int>(1, q);
}
void operator()() const {
*answer = 42;
};
std::string get_answer() {
std::stringstream ss;
ss << *prefix << *answer << "\n";
return ss.str();
}
private:
int *answer;
std::string *prefix; // This member is not dereferenced from device code
};
int main() {
sycl::queue q;
MyKernel k{q};
q.single_task(k).wait();
std::cout << k.get_answer();
return 0;
}
A little tedious, but working at least
@TApplencourt I think that would be one way to handle it, but it's probably not the most efficient. That strategy will still pass prefix
as a kernel argument unless the implementation is smart enough to optimize the argument away. Another strategy would be to wrap the entire MyKernel
class as a member of a larger class, and move prefix
to the larger (wrapping) class. This way, MyKernel
would contain just the member variables that are needed to run the kernel and the wrapping class would contain the host logic that interprets the result.
Good point. By experience, I know that people don't like to untangle their big fat object when porting their code to GPU :(.
It may make me think that In OpenMP ( and the early days of dpcpp), we got a problem with people defining long double
member but not using them. Code like this now work with dpcpp:
#include <sycl/sycl.hpp>
#include <iostream>
#include <sstream>
class MyKernel {
public:
MyKernel(sycl::queue q) {
prefix = 21;
answer = sycl::malloc_shared<int>(1, q);
}
void operator()() const {
*answer = 42;
};
std::string get_answer() {
std::stringstream ss;
ss << prefix << " " << *answer << "\n";
return ss.str();
}
private:
int *answer;
long double prefix; // This member is not referenced from device code
};
int main() {
sycl::queue q;
MyKernel k{q};
q.single_task(k).wait();
std::cout << k.get_answer();
Does a pedantic implementation should refuse this kind of code? (My GPU doesn't support long doubles.) If not, by experience, it will be a significant usability pain point. I guess the support can just be a "vendor / backend extension"
@TApplencourt I don't think we can guarantee that this works. There can always be cases where the compiler cannot determine that some member is not used. In that case, long double
ends up in the parameter list of the kernel, and then it's quite possible that backend GPU compilers refuse the code.
I understand the issue, but that is still weird. I'm not even sure that a long double*
will work (we can't fake a forward declaration for it). Even a void*
may not work (if, for example, it is cast to a long double*
in device code that is never executed).
Yes, potentially related issue for the pointer types: https://github.com/KhronosGroup/SYCL-Docs/issues/526
I'm convinced. My dream of being lazy is at the mercy of compiler optimization!
We think this statement in section 4.12.4 "Rules for parameter passing to kernels" is not quite right:
This wording implies that it would be OK to define a member variable that is not referenced from
operator()
(or by any of the member functions that it calls), even if that member variable is not a legal kernel argument. This is the case, for example, in this sample program:The question is whether code like this is legal, even though the member variable
prefix
has a type that is not a legal kernel argument. We think our original intent was that the entire class of the named kernel object (MyKernel
in the example above) must be a legal kernel argument, and thus the code snippet above is not spec conformant.In fact, DPC++ does diagnose an error for the code snippet above. My understanding is that AdaptiveCpp passes the entire object as a kernel parameter in cases like this, so I presume the code above would also be illegal in AdaptiveCpp, though I have not checked.
If we agree that this is the intent, the spec statement I quote above should be clarified.