KhronosGroup / SYCL-Docs

SYCL Open Source Specification
Other
117 stars 68 forks source link

Spec should disallow named kernel object member variable types that cannot be kernel arguments #524

Open gmlueck opened 11 months ago

gmlueck commented 11 months ago

We think this statement in section 4.12.4 "Rules for parameter passing to kernels" is not quite right:

If the kernel is a named function object, the operator() member function (or other member functions that it calls) may reference member variables inside the same named function object. Any such member variables become parameters to the kernel

This wording implies that it would be OK to define a member variable that is not referenced from operator() (or by any of the member functions that it calls), even if that member variable is not a legal kernel argument. This is the case, for example, in this sample program:

#include <sycl/sycl.hpp>
#include <iostream>
#include <sstream>

class MyKernel {
 public:
  MyKernel(sycl::queue q) {
    prefix = "The answer: ";
    answer = sycl::malloc_shared<int>(1, q);
  }

  void operator()() const {
    *answer = 42;
  };

  std::string get_answer() {
    std::stringstream ss;
    ss << prefix << *answer << "\n";
    return ss.str();
  }

 private:
  int *answer;
  std::string prefix;   // This member is not referenced from device code
};

int main() {
  sycl::queue q;

  MyKernel k{q};
  q.single_task(k).wait();
  std::cout << k.get_answer();

  return 0;
}

The question is whether code like this is legal, even though the member variable prefix has a type that is not a legal kernel argument. We think our original intent was that the entire class of the named kernel object (MyKernel in the example above) must be a legal kernel argument, and thus the code snippet above is not spec conformant.

In fact, DPC++ does diagnose an error for the code snippet above. My understanding is that AdaptiveCpp passes the entire object as a kernel parameter in cases like this, so I presume the code above would also be illegal in AdaptiveCpp, though I have not checked.

If we agree that this is the intent, the spec statement I quote above should be clarified.

illuhad commented 11 months ago

I agree that implementations should not be required to support this. The wording in the spec seems too narrow and all members should be treated as kernel arguments. I believe this would be more comparable to the lambda-as-kernel world, where I would compare a usage as in the example to an explicit capture of a variable.

In fact, DPC++ does diagnose an error for the code snippet above. My understanding is that AdaptiveCpp passes the entire object as a kernel parameter in cases like this, so I presume the code above would also be illegal in AdaptiveCpp, though I have not checked.

Generally, AdaptiveCpp is very laid back with diagnosing errors which is partly a result of supporting integration with other toolchains where we don't want to accidentally restrict functionality that would otherwise natively work, and partly because AdaptiveCpp has perhaps a more research-y background where experimentation and flexibility is central.

The precise behavior of what happens with this example depends on the compilation flow.

On the OpenMP host backend, it would work, just as in regular C++ code. That backend supports almost any C++ code as an extension.

For the clang CUDA/HIP compilation flows, we indeed pass the entire object as a parameter. That would by itself not trigger an error, however, the destructor of std::string ending up in device code may cause either compilation failure, or potentially be UB if the free() happens to compile by virtue of CUDA having some support for device-side free which then gets invoked with a host pointer.

In the SSCP compiler, the kernel lambda/function object is passed into the kernel by decomposing it into POD members which are then reassembled as the struct in device code. If an argument is unused, the compiler might optimize it away. The problem probably again would be the std::string destructor. If that is not optimized away and remains in device code, I would expect JIT failure due to unsupported host functionality (like free()) being invoked.

So tl;dr: Supported as an extension in the OpenMP host backend (either with compiler acceleration or as library-only), as soon as offload backends come into play problems are indeed expected.

gmlueck commented 11 months ago

Thanks, @illuhad. It seems like we are in agreement that the spec should be clarified to say that all member variables must have a type that is a valid kernel argument. If AdaptiveCpp allows other types on some backends, I think this would be an extension in your implementation.

nliber commented 11 months ago

I agree this is better.

I think we also have to clarify that the object doesn't have any static member variables. We talk about non-static member variables further down, but I don't think we ever say that static member variables are not (required to be) supported. (That's probably too simple as well, as we can somewhat support static constexpr member variables.)

TApplencourt commented 11 months ago

Just to be clear, if people still want to have members who are not legal device type, they need to store raw pointer to them. AKA:

#include <sycl/sycl.hpp>
#include <iostream>
#include <sstream>

class MyKernel {
 public:
  MyKernel(sycl::queue q) {
    prefix = new std::string("The answer:");
    answer = sycl::malloc_shared<int>(1, q);
  }

  void operator()() const {
    *answer = 42;
  };

  std::string get_answer() {
    std::stringstream ss;
    ss << *prefix << *answer << "\n";
    return ss.str();
  }

 private:
  int *answer;
  std::string *prefix;   // This member is not dereferenced from device code
};

int main() {
  sycl::queue q;

  MyKernel k{q};
  q.single_task(k).wait();
  std::cout << k.get_answer();

  return 0;
}

A little tedious, but working at least

gmlueck commented 11 months ago

@TApplencourt I think that would be one way to handle it, but it's probably not the most efficient. That strategy will still pass prefix as a kernel argument unless the implementation is smart enough to optimize the argument away. Another strategy would be to wrap the entire MyKernel class as a member of a larger class, and move prefix to the larger (wrapping) class. This way, MyKernel would contain just the member variables that are needed to run the kernel and the wrapping class would contain the host logic that interprets the result.

TApplencourt commented 11 months ago

Good point. By experience, I know that people don't like to untangle their big fat object when porting their code to GPU :(.

It may make me think that In OpenMP ( and the early days of dpcpp), we got a problem with people defining long double member but not using them. Code like this now work with dpcpp:

#include <sycl/sycl.hpp>
#include <iostream>
#include <sstream>

class MyKernel {
 public:
  MyKernel(sycl::queue q) {
    prefix = 21;
    answer = sycl::malloc_shared<int>(1, q);
  }

  void operator()() const {
    *answer = 42;
  };

  std::string get_answer() {
    std::stringstream ss;
    ss << prefix << " " << *answer << "\n";
    return ss.str();
  }

 private:
  int *answer;
  long double prefix;   // This member is not referenced from device code
};

int main() {
  sycl::queue q;

  MyKernel k{q};
  q.single_task(k).wait();
  std::cout << k.get_answer();

Does a pedantic implementation should refuse this kind of code? (My GPU doesn't support long doubles.) If not, by experience, it will be a significant usability pain point. I guess the support can just be a "vendor / backend extension"

illuhad commented 11 months ago

@TApplencourt I don't think we can guarantee that this works. There can always be cases where the compiler cannot determine that some member is not used. In that case, long double ends up in the parameter list of the kernel, and then it's quite possible that backend GPU compilers refuse the code.

nliber commented 11 months ago

I understand the issue, but that is still weird. I'm not even sure that a long double* will work (we can't fake a forward declaration for it). Even a void* may not work (if, for example, it is cast to a long double* in device code that is never executed).

illuhad commented 11 months ago

Yes, potentially related issue for the pointer types: https://github.com/KhronosGroup/SYCL-Docs/issues/526

TApplencourt commented 11 months ago

I'm convinced. My dream of being lazy is at the mercy of compiler optimization!