Open Michoumichmich opened 2 years ago
Another option would be to simply remove the kernel type-name (i.e class hello
in your example). If the kernel was not reachable and there is no name, the implementation could safely remove it. However, this means the dead-kernel optimization would not be available to implementations that required the kernel type-name.
Thinking more, though, maybe the implementation could just return a kernel ID for a pre-defined "null kernel" in cases where it was optimized away. It does not seem very useful for the application to query about a kernel that can never be submitted.
Another option would be to simply remove the kernel type-name (i.e
class hello
in your example). If the kernel was not reachable and there is no name, the implementation could safely remove it. However, this means the dead-kernel optimization would not be available to implementations that required the kernel type-name.
I guess that the unnamed lambdas should be already optimisable away unless std::vector<kernel_id> get_kernel_ids() const;
can return a kernel_id
corresponding to an unnamed lambda :/ in which case it would change the behaviour (when thinking about passing the vector across translation units).
Thinking more, though, maybe the implementation could just return a kernel ID for a pre-defined "null kernel" in cases where it was optimized away. It does not seem very useful for the application to query about a kernel that can never be submitted.
I agree with you that querying unreachable kernels in production can be useless, however during development it can happen. Wouldn't the change break the "as-if" rule of the compiler? As an opt-in
with an attribute I'm totally for it though as there would be a reasonable value to return for kernel queries without having to compile the kernel "just because you have a query". To me it's the free version of get_kernel_ids
that is the real blocker, but with the opt-in you could say that the kernel cannot be reached through get_kernel_ids()
and everything folds nicely.
Just thought about it, but with the attribute approach you could get some "nice" diagnostics when you query a kernel that was not defined in the current translation unit:
// In a header
[[sycl::inline_kernel]] class my_lambda_name;
// kernel.c/.so
q.parallel_for<my_lambda_name>()... // kernel not reachable, let's delete it
// queries.c/.so
get_kernel_id<my_lambda_name>(); // error, inline kernel not defined here, query can't work anyway
because actually this links but throws/won't work as one might expect (with dpcpp)
cries in hipSYCL at the prospect of introducing yet another attribute instead of something that can work well in library-only scenarios :(
cries in hipSYCL at the prospect of introducing yet another attribute instead of something that can work well in library-only scenarios :(
Haha, sorry about that, I must admit that I thought about hipSYCL too, but I don't know how to do that in a library-only friendly style^^. But as gmlueck suggested, it could also be solved without attributes :)
@Michoumichmich There are some ideas how attributes could be replaced as a whole, e.g. https://github.com/illuhad/hipSYCL/pull/543 or Intel's constexpr property lists which are even more general.. But if we don't even need to go down the attribute replacement route to achieve the functionality you want, it is easier of course :)
@Michoumichmich There are some ideas how attributes could be replaced as a whole, e.g. illuhad/hipSYCL#543 or Intel's constexpr property lists which are even more general.. But if we don't even need to go down the attribute replacement route to achieve the functionality you want, it is easier of course :)
hipSYCL is infinite! That is really interesting. For the attribute proposed here I guess you could also implement that as a type trait on the kernel name/type. However my point was originally to see if we can delete kernels. I will update the tile to remove the attribute detail :)
In general, I think it would be better to solve this without adding a new attribute (or a new compile-time property) because I suspect that few people would add these attributes anyway. It would be better to come up with some solution that allows an implementation to remove unreachable kernels even when users do not add any special attribute or property.
One possibility is that the implementation could still return a kernel_id
in such cases, even though the kernel itself is removed. If we go this route, we'd have to decide what the various query operations do in such a case.
Another option is to tweak the specification to make it easier for an implementation to remove unreachable kernels. For example, it may not be necessary for an implementation to include a kernel_id
for an unreachable kernel from the call to get_kernel_ids()
. If an implementation removes an unreachable kernel, it's not really "in the application" anymore, so this seems reasonable.
In general, I think it would be better to solve this without adding a new attribute (or a new compile-time property) because I suspect that few people would add these attributes anyway. It would be better to come up with some solution that allows an implementation to remove unreachable kernels even when users do not add any special attribute or property.
Okay, thanks, I get it now!
One possibility is that the implementation could still return a
kernel_id
in such cases, even though the kernel itself is removed. If we go this route, we'd have to decide what the various query operations do in such a case.
I see, we could throw an exception with errc::invalid
(or add a more precise error such as kernel_unavailable
?) if the user got the kernel using the name with: get_kernel<KernelName>()
? Or just throw while calling get_kernel<KernelName>()
? So if the developper is just testing and queries the kernel, but the compiler detects that the call site is unreachable, there is a meaningful error message/exception. (But returning some default values would be safer if the exceptions are uncaught)
Another option is to tweak the specification to make it easier for an implementation to remove unreachable kernels. For example, it may not be necessary for an implementation to include a
kernel_id
for an unreachable kernel from the call toget_kernel_ids()
. If an implementation removes an unreachable kernel, it's not really "in the application" anymore, so this seems reasonable.
Do you know if the specification requires to return the unnamed kernels through get_kernel_ids()/get_kernel_bundle()
? From the definition it seems that it could, but I unsure. If it doesn't it would be an easy entry point for optimizing away unused unnamed lambdas
P1
Actually the SYCL specification does not allow to remove dead/unreachable kernels at least because of free functions such as
get_kernel_bundle
(which could be called indirectly from another translation unit and use the kernel even if it appears unreachable in the current one). It's not only related to SYCL specifically. However removing unused kernels could have some advantages (binary size, JIT latency), especially in large projects where codepaths might be unreachable due to compile-time config (and with a lot of macro generated kernels ahah). Having an attribute such as[[sycl::inline_kernel]]
would be welcome. Applying it to a kernel would mean it can be discarded in dead code analysis unless its name is used within the current translation unit (or its call site is reachable, of course).And so something like
would compile to nothing (which is kind of what you would expect/want at first sight).
An optimisation that could be done as a result, in a case where online and offline compilation are available, would be to compile offline all the
[[sycl::inline_kernel]]
that were kept after dead code analysis (as they are necessarily used) and use online compilation for the other kernels. We get a tradeoff launch latency and binary size.