KhronosGroup / OpenCL-Registry

OpenCL API and Extension Registry.
112 stars 42 forks source link

Built-in kernel registry #131

Closed pjaaskel closed 9 months ago

pjaaskel commented 1 year ago

This is to store any comments/progress/feedback related to the idea of a "built-in kernel registry" which would be an index of semantically well-defined built-in kernels. HW vendors can freely choose a subset of built-in kernels (BiKs) to implement and accelerate as fixed function "black box" functions. The clients can then pick up vendor-advertised BiKs that match their application's acceleration needs. The idea has been presented in OpenCL work-group meetings and received at least some lukewarm interest/curiosity.

The extension specification could be named cl_khr_defined_biks or similar and would simply add a sentence to the clGetDeviceInfo()'s CL_DEVICE_BUILT_IN_KERNELS which declares along the lines of

"The built-in kernels returned by the device of which names start with 'khr.' adhere to the defined semantics and behavior stored in the Khronos Built-in Kernel Registry located in http://.../KhronosGroup/BiK-Registry."

For the concept to make sense as a Khronos-entity and a cross-vendor portability feature, the built-in kernel abstraction should be used more extensively (hundreds of entries in the registry, not just a couple) and their semantics should be a bit more generic than very specific exact matches to the underlying single-vendor hardware. This is to create better chances of matching with the application level task graphs instead of becoming a collection of vendor-specific BiKs.

The "living" slide deck that tries to explain the concept and collects the open questions can be seen here.

All kind of feedback / comments appreciated!

bashbaug commented 1 year ago

One thought: it would be interesting to write this up first as-if it were an extension, similar to cl_intel_motion_estimation. It's still not entirely clear to me what the benefit would be of the built-in kernel registry vs. reusing extension specifications for this type of documentation, and maybe seeing an extension would help to emphasize the differences.

pjaaskel commented 1 year ago

To match the flexibility of the proposal with extensions (the proposal allows choosing built-in-kernels freely from implementor and user perspective), there would need to be an extension for each built-in-kernel (not a set of, but each single kernel) which the vendors can implement and clients call from their app if the device advertises them. It works at the moment when there are only a few BiKs, but part of this idea is to have hundreds of potentially HW accelerated functions for key areas, thus a separate organized index where BiKs could refer to each other and have bit accurate machine readable semantics would help automated graph lowering and other use cases. But like I said, it is not useful if BiKs won't become used extensively and there are only a few of them supported by vendors, so it's a bit of a chicken-egg at this stage.

bashbaug commented 1 year ago

This is still a bit hand-wavy I know, but I don't think we would necessarily need an extension for each built-in kernel, and in the limit we could be OK with a single extension for a living set of built-in kernels. Here's how it could work:

Here's how an application would use this extension:

Would this work?

pjaaskel commented 1 year ago

Thanks for the feedback. Yes, it would basically be fine to put the "BiK registry" to a single extension which includes the flexibility aspect and is expanded when new BiKs are added and old ones updated.

The main motivations for a separate "registry" (basically a structured git repository) would be the semantics definitions and machine readability: If the semantics (if non-trivial to describe bit-accurately in words unlike "matrix multiplication") are written in OpenCL C or SPIR-V, the extension document becomes bloated. Machine readability calls for a structured format such as XML or JSON from which the human-readable documentation could be also generated.

Anyhow, the first version could do without these fancy features and we can add them later along with the possible separate git repo, if the idea receives support and a lot of BiKs get added to the extension.

Another topic to consider: Do we want to allow making the BiKs also callable from software-defined kernels like function calls? It would be useful in many cases where the BiK is fine grained or to simply reduce the number of commands to launch (to support manual or automated operator/kernel fusion).

jansol commented 1 year ago

Another topic to consider: Do we want to allow making the BiKs also callable from software-defined kernels like function calls? It would be useful in many cases where the BiK is fine grained or to simply reduce the number of commands to launch (to support manual or automated operator/kernel fusion).

That's an interesting question also in light of recent features like VK_NV_device_generated_commands / D3D12 ExecuteIndirect on the graphics API side.

jansol commented 1 year ago

Another interesting possibility would be to make BiKs eligible for use with the enqueue_kernel(...) function from OpenCL C 2.0.

pjaaskel commented 1 year ago

@jansol yep, same thought came to mind when I started typing this down as an extension yesterday. The first draft as html here. Please comment here.

bashbaug commented 1 year ago

Another interesting possibility would be to make BiKs eligible for use with the enqueue_kernel(...) function from OpenCL C 2.0.

I had a similar reply half typed out also (great minds think alike?).

Can a built-in kernel be recorded into a command buffer (cl_khr_command_buffers)? This might "just work", but I don't recall discussing this use-case, so it would be good to verify.

pjaaskel commented 1 year ago

@bashbaug yes, being able to benefit from cmdbuffers is one of the great benefits of being able to define all core functionality of the program in CQs. I mentioned about that in the above extension draft. If that's currently not possible for some reason for the command buffer extensions, it shall be changed, as that's been the master plan all the time from our side (in the research group's activities I mean).

pjaaskel commented 10 months ago

The next revision (led by Henry) is being discussed here: https://github.com/pjaaskel/OpenCL-Docs/pull/1

The HTML render of the latest WiP draft.

pjaaskel commented 9 months ago

This issue is now outdated. Let's continue the discussion in https://github.com/KhronosGroup/OpenCL-Docs/pull/867.