KhronosGroup / SYCL-Docs

SYCL Open Source Specification
Other
110 stars 67 forks source link

Allow "forbidden" C++ features in device code constant expressions #379

Open gmlueck opened 1 year ago

gmlueck commented 1 year ago

This proposal is partly inspired by comments in #371.

The SYCL 2020 specification lists several C++ features that are not allowed in device code (section 5.4 "Language restrictions for device functions"). That spec also lists some C++ features that are optionally allowed in device code, so a kernel using these features may only be submitted to a device that advertises support for the feature (section 5.7 "Optional kernel features").

We should consider relaxing these restrictions somewhat, so that any kernel may use these features within constant expressions. For example, this would allow a kernel to call a recursive function (a forbidden feature) in a constant expression. It would also allow a kernel to use double (an optional feature) within a constant expression even if the kernel is submitted to a device that does not have aspect::fp64. (Of course, if the constant expression itself had type double, then the restriction would still apply.)

I believe these restrictions were added to the SYCL spec to avoid limitations in GPU hardware. However, constant expressions are required to be computed by the compiler, so GPU limitations are not relevant. I also think this proposal would solve some ambiguities about where these forbidden features are disallowed. Consider the following example that has several calls to a recursive function:

#include <iostream>
#include <sycl/sycl.hpp>

enum class where {
  host,
  device
};

// Recursive function that *can* be called in a constant expression.
constexpr where recurs(int i) {
  if (i == 0)
    return where::device;
  if (i == 1)
    return where::host;
  return recurs(i-2);
}

template<int N>
typename std::enable_if_t<recurs(N) == where::host, int>   // Recursive call 1A
host_or_device() {
  return 1;
}

template<int N>
typename std::enable_if_t<recurs(N) == where::device, int>  // Recursive call 1B
host_or_device() {
  return 2;
}

constexpr where w1 = recurs(4);       // Recursive call 2

int main() {
  sycl::queue q;
  int *ret = sycl::malloc_shared<int>(1, q);
  q.single_task([=]{
    *ret = host_or_device<4>();       // Compiler must evaluate recursive call
                                      // to resolve template reference

    if (w1 == where::device)
      *ret = 3;

    constexpr where w2 = recurs(4);   // Recursive call 3
    if (w2 == where::device)
      *ret = 4;

    where w3 = recurs(4);             // Recursive call 4
    if (w3 == where::device)
      *ret = 5;
  }).wait();
  std::cout << *ret << "\n";
}

The two call sites (1A) and (1B) occur in a constant expression, but it's unclear if this is consider "device code". The call to host_or_device occurs in device code, so the compiler must execute the recursive function in order to determine which template to instantiate. It is therefore unclear if the SYCL spec's limitation about calling recursive functions applies here.

The call site (2) also occurs in a constant expression, and it's also unclear if this is considered "device code". It's defined outside of a device function scope, but it is referenced by a device function. Therefore, it's also unclear if the spec's limitation applies here.

The call site (3) is exactly the same constant expression, but the variable is defined in a device function scope. This seems more like device code, but why should this be illegal if (2) is allowed?

The call site (4) is not a constant expression, so this is clearly not allowed by the SYCL spec.

FWIW, DPC++ currently allows calls (1A), (1B), and (2), but it diagnoses an error for calls (3) and (4).

This proposal also offers a practical solution for a problem raised in #371. The core C++ specification requires the use of long double (a forbidden feature) in order to implement a user defined floating point literal (other than the "raw" form). Here is a typical example for an application that wants to create a user defined floating point literal of type float:

float operator""_udl(long double val) { // C++ requires use of "long double" here
  return static_cast<float>(val);
}

int main() {
  sycl::queue q;
  q.single_task([=]{
    float var = 3.14_udl;  // This calls operator ""_udl above
  });
}

This is not legal SYCL code because the kernel uses long double. However, if we relax the SYCL restriction to allow forbidden features in constant expressions, the user can change their code to use consteval like this:

consteval float operator""_udl(long double val) { // Calls to this function are now a constant expression.
  return static_cast<float>(val);
}

int main() {
  sycl::queue q;
  q.single_task([=]{
    float var = 3.14_udl;  // This calls operator ""_udl above
  });
}

And this change allows the application to use a user-defined floating point literal in a way that is conformant to the SYCL spec (assuming we relax the spec restriction).

TApplencourt commented 1 year ago

cpppreference tells me that: The constexpr specifier declares that it is possible to evaluate the value of the function or variable at compile time.. Note the it is possible and not it must. Doesn't that, unfortunately, forbids the kinds of "optimization" you describe here?

On the other side, I seems that the consteval is forced to return a compile-time constant, so we should be able to use them (but hey are C++20)

gmlueck commented 1 year ago

My proposal is to allow these features in a "constant expressions". This is not the same as saying they are allowed whenever the application uses the constexpr keyword. The application is still responsible for knowing which uses of constexpr result in a constant expression and which do not. I believe these rules are defined in the core C++ specification.

I agree that the C++20 consteval keyword could be a powerful tool that allows the application to tell the compiler that certain expressions must be evaluated at compile time. Thus, if the user's compiler understands this keyword, that user will have more opportunities to use "forbidden" or "optional" C++ features in device code. My example above with user-defined floating point literals is an example of this.

However, my proposal does not require use of consteval, and therefore this proposal does not require SYCL to adopt C++20 as its minimum required C++ version.

TApplencourt commented 1 year ago

To be clear, I think saying restriction of kernels only applies on the results of compile-time constant is a good proposal, and I'm 100% in favor of that!

I'm just afraid that people will have a hard time relying on them due to the complexity of the conversions rules from constexpr to compile-time constant`. But I guess this is a C++ problem and not an SYCL one. But I can already imagine GitHub issue with title "I use constexpr but It doesn't compile!". Maybe we just need to be super clear on the spec that we are talking about compile-time constant and not constexpr.

nliber commented 1 year ago

I agree we should do this.

[expr.const] only says (paraphrasing) that it may be done at translation time, not that it must be done at translation time, unless it is used at translation time (array dimension, NTTP instantiation, etc.).

I'm not sure how we say that if it is only available on the host and it is needed on the device it must be done at translation time. Maybe that is what is meant by manifestly constant-evaluated?

TApplencourt commented 1 year ago

More information on the manifestly constant-evaluated term: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0595r2.html

keryell commented 1 year ago

This reminded me https://github.com/KhronosGroup/SYCL-Docs/pull/388 for https://github.com/KhronosGroup/SYCL-Docs/issues/267.

keryell commented 1 year ago

More information on the manifestly constant-evaluated term: open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0595r2.html

This is now in the standard: http://eel.is/c++draft/expr.const#19

TApplencourt commented 1 year ago

Sorry I wasn't clear. The definition of manifestly constant evaluated was a little clearer in the proposal than in the spec. But indeed, it's now part of the spec.

gmlueck commented 1 year ago

I'm not sure how we say that if it is only available on the host and it is needed on the device it must be done at translation time. Maybe that is what is meant by manifestly constant-evaluated?

Yes, I think this is the right term. I think the SYCL spec should be changed to say that:

Should we add this more general language rather than a special case for recursion (#388)?

FWIW, I was thinking that we would relax these rules in SYCL-Next. Do we want to add it as a bug fix to SYCL 2020 instead?

keryell commented 1 year ago

It seems simple enough to add it as a bug fix since all the constant evaluation is handled by the front-end compiler.

nliber commented 1 year ago

I think we should add the more general language as a bug fix to SYCL 2020.