[Feature request] Wayland compositor side extension

nyorain commented 8 years ago

Hey i've got a feature request for an extension similiar to a combination of EGL_WL_bind_wayland_display provided by the mesa implementation (not an official extension) and EGL_KHR_image_base.

It should make wayland compositors able to retrieve vulkan images from wayland client buffer resources that are managed by the same drivers egl or vulkan client implementation.

If a wayland client uses vulkan (and the wayland wsi) the only way for the compositor to get the contents of this buffer is the egl extension. Therefore, one cannot write an efficient vulkan compositor backend which is really a pity considering that vulkan has (at least in theory) with VK_EXT_KHR_display a really nice and convinient api for direct rendering on linux.

I wrote down some first ideas how this extension could look like in vulkan. I am not that familiar with the khronos extension naming scheme, so i am not sure what extension suffix it would get. KHR like the other WSI extensions?

To bind a wayland display to a vulkan instance (i.e. creating a wl_drm global) or to unbind it:

VkResult vkBindWaylandDisplay<Suffix>(VkInstance instance, struct wl_display *display);
VkResult vkUnbindWaylandDisplay<Suffix>(VkInstance instance, struct wl_display *display);

To check if a wayland client resource can be transformed into a vulkan image (i.e. if the resource is managed by the wl_drm):

vkResult vkQueryWaylandBuffer<Suffix>(
    VkInstance instance, 
    struct wl_resource *buffer, 
    VkWaylandBufferAttribute<Suffix> attribute, 
    uint32_t *value);

This kind of querying seems like not used anymore in vulkan, so it could also be done with a structure holding all the queryable information about a buffer (which would be something like size and format):

VkResult vkGetWaylandBufferProperties<Suffix>(
    VkInstance instance, 
    struct wl_resource *buffer,
    VkWaylandBufferProperties *pProperties);

Then just a function to create a vulkan image from a valid client buffer. The VkWaylandBufferImageCreateInfo structure would contain the wl_resource* and some attributes like in egl.

VkResult vkCreateWaylandBufferImage<Suffix>(
    VkDevice device,
    VkWaylandBufferImageCreateInfo<Suffix> *pCreateInfo,
    VkAllocationCallbacks *pAllocator,
    VkImage *pImage);

Mesas intel driver already has a branch with some first implementation of something like this. But IMO it would really make sense to have a driver-independent extension.

If this is not the right place for a request like this or it is totally inappropriate please just let me know. Thank you.

cubanismo commented 8 years ago

I don't think trying to map the EGL constructs directly into Vulkan is the right way to handle this. Instead, there should be a way to implement a Wayland Vulkan swapchain on top of purely Vulkan objects using Vulkan system layer, and then there wouldn't need to be any special visibility into the Vulkan objects. The Wayland compositor then wouldn't need any special extensions beyond sharing Vulkan images across processes. The need for the EGL extensions is just a side effect of the limitations of EGL's design. Vulkan has a far more flexible WSI and core design.

nyorain commented 8 years ago

I agree that shared Vulkan images would be of great use for writing Wayland compositors but even then they must be somehow integrated with the Wayland server side. The compositor must have some way to query whether it can retrieve a Vulkan image from a clients buffer (i.e. if the client uses EGL or Vulkan) which is not possible at the moment. You are right that there might be better ways to do this in Vulkan than just converting the (non standard) EGL extensions to Vulkan extension, this was more an example to show which functionality would be needed to allow compositors to use Vulkan.

cubanismo commented 8 years ago

Agreed. With EGL, the type of the buffer (DRM buffer, some other accelerated buffer type) was necessarily driver specific as there was no standardized cross-process image type, so it made sense to make that driver functionality. With Vulkan, there should be a standard way to represent Vulkan images outside of the driver and across process boundaries, so it shouldn't be necessary for wayland to embed itself into the Vulkan driver just to import/export buffers and info about them. Stay tuned for more info on this, as hinted by issue #205.

To handle the case of EGL clients talking to a Vulkan server, I hope a vendor-agnostic Vulkan layer rather than an in-driver hook can be used by the wayland compositor/server. That will still require an extension for the layer to expose its API, but I think it will look a bit different than the example linked here.

Note I don't necessarily speak for the entire Khronos working group. These are my design opinions.

fooishbar commented 8 years ago

Instead, there should be a way to implement a Wayland Vulkan swapchain on top of purely Vulkan objects using Vulkan system layer, and then there wouldn't need to be any special visibility into the Vulkan objects.

Kind of. If you're fluent in Android, a wl_buffer is the equivalent to a GraphicBuffer/ANativeBuffer; it is an opaque handle that the server receives, which it requires the native graphics stack to turn into something the compositor can texture from: in EGL, this is EGLImage. The underlying primitive is a single frame, and the general opinion of the upstream Wayland project when EGLStreams was raised (disclaimer: I was the one most strongly arguing this point) was that a stream/swapchain model which removed the compositor's ability to address individual buffers, was inappropriate in a compositor context. The Wayland WSI swapchain model presently seems to work fine for clients, however.

Given that, and the lack of a standardised ICD or HAL model for non-Android Linux driver deployments, we chose to implement the driver support inside the EGL library, which necessarily requires knowledge of Wayland primitives (i.e. 'I would like you to provide accelerated-rendering functionality on this display'). I'm more than happy to discuss this at greater length.

With Vulkan, there should be a standard way to represent Vulkan images outside of the driver and across process boundaries, so it shouldn't be necessary for wayland to embed itself into the Vulkan driver just to import/export buffers and info about them.

Yes, a hypothetical standardised cross-process image export and exchange API would resolve the majority of these issues. It does change the Wayland model and arguably make containerisation and isolation more difficult, but that's a separate issue to address I believe.

(Similarly, I don't necessarily speak for either the Wayland project, nor Khronos.)

tomek-brcm commented 8 years ago

You'd probably need at least 2 hypothetical standardised cross-process primitives for that: image + semaphore in Vulkish (or buffer + fence in Androidish) .

The WSI model in Vulkan looks to me like a half of a solution. Vulkan application can sit on top of a windowing system of sorts but it can't really be used to implement a windowing system(*). You can create WSI surface as an "output" abstraction but I couldn't find an "input" counterpart. Vulkan-client can write to a "window" but Vulkan-compositor can't read from such "window".

And that's only a first part of the solution. The 2nd part is writing and reading in an orderly fashion. Android uses cross-process fences sent over binder API. Vulcan has fences semaphores and events but none of them seem to be suitable to cross a process boundary. Finally Wayland has nothing.

Currently GLES/EGL implemntation of Wayland compositor and clients is based on a silent assumption that driver has an implicit cross-process synchronisation. This is not GLES or EGL requirement, just an unwritten Wayland assumption. Vulkan is all about explicit synchronisation that is responsibility of an application but it doesn't provide any cross-process synchronisation primitives. This is, to my untrained eye, a blocker issue.

Finally, if you intend to to mix and match client APIs, for example run a Vulcan client on GLES compositor, or perhaps the other way around, then even a cross-process Vulcan image + semaphore won't cut it. For that you'd need some sort system-level cross-API primitives, hopefully without eglImage craziness.

(*) Well, technically one can memcpy between host-mapped Vulkan image and shm and than memcopy again in the compositor but I don't count that as a reasonable implementation

nyorain commented 8 years ago

Wayland has no concepts like fences or semaphores in its core protocol to synchronize access to resources because it is not needed. It does not assume that drivers synchronize the access to the buffers internally, it has a well defined mechanism of buffer ownership and simply offers the possibility for drivers to implement the wl_drm interface on client as well as on compositor side. EGL (with the mentioned mesa extension) implements them on both sides while Vulkan (WSI) does so only on the client side and trusts that compositors use EGL anyways to get the buffers contents. Without EGL no compositor would be able to support Vulkan clients at the moment (besides dealing with the raw drm buffers and implementing wl_drm).

But we should simply wait and see what the next Vulkan API updates bring. Some kind of shared images would solve this request in a really nice way although we would still need some way to connect this feature with the Wayland protocol or is there any other way? How should the compositor know of the Vulkan image a client attached to a surface?

tomek-brcm commented 8 years ago

Wayland has no concepts like fences or semaphores in its core protocol to synchronize access to resources because it is not needed.

Of course it IS needed if GPU is involved (GLES/EGL case, see http://ppaalanen.blogspot.co.uk/2012/03/what-does-egl-do-in-wayland-stack.html). Bear in mind that CPU is always ahead of GPU. The fact that CPU has finished issuing draw commands and handed over a buffer to the other side doesn't mean that GPU has finished drawing. Also GLES/EGL only guarantees synchronisation wihin a single context. Any cross-context synchronisation must be done explicitly by an application . A cross-process synchronisation is completely out of scope as far as GLES and EGL or Vulkan are concerned.

It does not assume that drivers synchronize the access to the buffers internally

In fact it does.

it has a well defined mechanism of buffer ownership

Perhaps it's well defined but certainly it isn't documented.

and simply offers the possibility for drivers to implement the wl_drm interface on client as well as on compositor side.

Well, only if DRM is in use. Wayland doesn't mandate DRM. Wayland only mandates some platform-specific wl_buffer that is handed between client and compositor.

The platform implementation of the eglSwapBuffers() calls the "attach" to hand buffer from client to compositor. At this stage GPU is likely to be busy with client's requests. Compostitor starts texturing from that buffer without any means of knowing whether the GPU has actually finished drawing client's stuff. This is the 1st implicit synchronisation point assumed by Wayland: GPU texture reads requested by the compositor process will wait for GPU writes requested by client in a different context from a different proces without any explicit action.

The same principle applies when compositor releases buffer back to client. The client is likely to receive the release event while GPU is still busy drawing compositor's stuff. This is the 2nd implicit synchronisation point: any client-side GPU writes to such released buffer must be deferred until compositor's draw commands have finished sampling the previous contents of that buffer. Again you have 2 completely independet GL contexts from different processes that somehow synchronise their access to a shared buffer.

When you look at Vulkan, the whole API is about explicit synchronisation at every step. I doubt that Wayland client or compositor written in Vulkan can assume that it's safe to render to and texture from a shared image and Vulkan driver will automagically synchronise both process on CPU and GPU side without any Wayland involvement.

I understand that Wyland doesn't actually mandate any hardware acceleration, but in the scenarios where GPU gets involved, there are only 2 options that don't depend on the implicit synchronisation: stalling the CPU on both "attach" and "release" or sending buffer with a GPU synchronisation primitive of some sort. Stalling the CPU isn't really an option and there are no cross-process GPU synchronisation primitives as far as I can tell.

nyorain commented 8 years ago

The Wayland core protocol clearly specifies that after a wl_surface_commit request the compositor may read the pixels at any time and after a wl_buffer_release event the compositor must not further access the pixels. The client is not allowed to use the buffer while the compositor uses it. I only mentioned wl_drm because it is the current way of all major drivers to implement hardware accelerated buffers in Wayland. You are right that drivers could use any (private) interface but they would still have to implement the behaviour specified by the Wayland protocol for their buffers. But this is not the right place to discuss the Wayland protocol.

In Vulkan, one must ensure that all rendering commands have finished before a swapchain image can be presented via vkQueuePresentKHR. This is not necessarily about stalling the cpu but more about simply not committing a buffer that is still used by the client. Therefore no extra synchronization is needed between the client and compositor side (always feel free to correct me if I understood it wrong).

Although I dont see any problems in this (since one side has to wait for the rendering to finish, moving this waiting to the compositor side only introduce unnecessary complexity), I agree with you that some method for presenting/sharing Vulkan images using explicit synchronization would improve the WSI even more.

tomek-brcm commented 8 years ago

If that was he case, Weston, the reference compositor implementation, would be plainly wrong.

The problem is that CPU deals with Wayland calls while GPU deals with pixels. In order to fulfil such strict pixel access guarantees Weston would have to call glFinish() or glClientWaitSync() or vkWaitForFences(). But Weston doesn't do such thing for a good reason - in this scenario CPU and GPU must wait for each other and work in turns while in fact they're entirely capable to work in parallel.

philiptaylor commented 8 years ago

In Vulkan, one must ensure that all rendering commands have finished before a swapchain image can be presented via vkQueuePresentKHR.

I don't think that's an accurate description - you only have to ensure all rendering commands will have finished executing before the device signals the pWaitSemaphores that were passing into vkQueuePresentKHR. The vkQueuePresentKHR can be called (and can return) a long time before the rendering has completed, and the WSI implementation is responsible for synchronising the presentation with those semaphores.

nyorain commented 8 years ago

Yes you are right, that was an oversimplification. The point I wanted to make was that the presentation of the image (in case of Wayland the wl_surface. commit call) should not happen until the rendering onto the image is finished if I understood the specification correctly.

tomek-brcm commented 8 years ago

This is not how Weston does it, so it might be an oversimplification in the Wayland spec. I'd be interested to hear from Wayland guys what would be the intended use of vkSemaphore passed to the "present" command if the presentation engine happens to be Wayland compositor.

Delaying the surface commit until GPU has signalled the semaphore is a way of doing it, but it's not an efficient way of doing it. Actually in Vulkan it's also not the most straight forward way because Vulkan API doesn't have CPU-side wait on vkSemaphore.

cubanismo commented 8 years ago

There are a lot of great observations and ideas here. Those of us working on next-generation features for Vulkan within Khronos are aware of the discussion here and will take the points raised into account.

To see one possible incarnation of some of the building blocks myself and others suggested here, take a look at the external memory Vulkan extensions from NVIDIA:

https://www.khronos.org/registry/vulkan/specs/1.0-extensions/xhtml/vkspec.html#_vk_nv_external_memory

While the current extensions work only on Windows, the ideas generalize to all platforms. Any feedback is appreciated!

philiptaylor commented 8 years ago

@cubanismo

Are memory objects the right level of abstraction for this kind of sharing?

Say I want an Android application to receive frames from the camera HAL and use them as textures, and output to buffers that are sent to a video encoder (through OMXCodec or similar), with no memory copies. (That seems a useful and realistic use case to me, and one that works with OpenGL/OpenCL today.)

My understanding is: Currently the lowest-level vendor-independent interface is ANativeWindowBuffer, which represents a single 2D image with a particular width/height/format/etc (stored as a vendor-dependent opaque array of ints and fds, including some reference to the underlying memory (often an ION fd) and the offset into it, and often including several vendor-specific parameters/flags that influence the layout of the image in memory).

The camera provides ANativeWindowBuffers and the video encoder consumes ANativeWindowBuffers. In OpenGL/OpenCL, I'd pass the ANativeWindowBuffer into eglCreateImageKHR and pass that into glEGLImageTargetTexture2DOES/clCreateFromEGLImageKHR. Then I can query that GL/CL image for its width/height/etc. That makes it fairly straightforward to set up a zero-copy pipeline.

How would this work in the VK_NV_external_memory model?

To call vkCreateImage, there'd need to be some new API to get the width/height/format/etc from the ANativeWindowBuffer (or from something derived from it, like EGLImage) and put it into VkImageCreateInfo.

But the image might be stored with some weird layout that doesn't match normal GPU images (e.g. an image from the camera probably has padding determined by the ISP hardware), which is indicated by hidden flags in the ANativeWindowBuffer, and there's no way to pass those flags into vkCreateImage.

Probably it would make more sense to construct the VkImage directly from the ANativeWindowBuffer, so the driver can handle all the non-standard layouts etc. It'd still need some new API to get the width/height/etc of the image, since some applications need to know that. Maybe it should also have a way to find the VkDeviceMemory and offset and size, for use in vkMapMemory etc, or maybe they should be like swapchain images where there's no way to access the underlying memory.

Since most APIs aren't designed around low-level concepts like raw device memory, images seem a more universal concept that would be better for interoperability.

cubanismo commented 8 years ago

@philiptaylor

That's certainly an interesting point. Note the NV extensions expose sharing of direct3D images, which are also not inherently raw memory, but still fit well into the model, so such use cases do indeed map into the memory-based model without much trouble.

It's interesting that you raise the case of camera-GPU sharing. The fact that interop APIs used for such sharing have historically been based on images has in fact been quite frustrating when trying to share things that aren't images between these devices, such as exposure and other capture parameters that can impact image processing. Were the sharing memory based, such data could be shared directly with the GPU as a raw chunk of bytes, as output by the camera driver, but instead it often falls back to using the CPU as an intermediary, or awkwardly packed into texels for lack of a proper interop primitive.

Yes, many existing APIs are not memory-based, but all of the modern APIs we've seen (Vulkan, DirectX 12, Mantle) are. Some native APIs, such as DMA-BUF, are closer to a memory-based sharing model than an image-based sharing model. Other APIs, such as OpenGL, EGL, and most other Khronos APIs are extensible and can easily incorporate new concepts. Designing a new API for Vulkan, a future-focused low-level API, based on the limitations of older APIs would not be the correct direction in my view. Of course, backwards compatibility must be achieved, but it should not be the primary design factor.

GunpowderGuy commented 7 years ago

I think this proposal should get more attention , as currently the problem of Nvidia wanting to use egl streams to interact with the display server instead of GBM still hasn't been solved and making use of wsi instead could make that a moot point .

cubanismo commented 7 years ago

There is a great deal of work going on within Khronos that relates to this issue. Unfortunately, it will continue to be difficult to discuss externally until the results of that work can be released. I know that can be frustrating, but stay tuned.

GunpowderGuy commented 7 years ago

But are you in a position to start answering some very important questions ? For example , will using wsi for the display server require toolkits to be modified ? ( kde devs say that making use of egl streams would , further adding to the difficulty of supporting it ) , and how would programs still using OGL and egl be handled ?

GunpowderGuy commented 7 years ago

Cubanismo , after reading some comments posted in a news article posted today , that stated other vendors , the gnome team and wayland devs had already gave in to nv wishes ( it turned out to be true , http://www.phoronix.com/scan.php?page=news_item&px=GNOME-Mutter-Mainline-EGLStream,) i searched for proof of it , and i found this : https://github.com/cubanismo/allocator (i didnt thought of checking your repos as you said the ongoing work wasnt publicly released ) a publicly known tentative to replace egl streams and gbm , which you are part of . If is that so , what cant be discussed ? in which ways it will interact with wsi ? if one will exist and the other will be canned or if both will coexist ? . What i found most interesting about this feature request is it could eliminate the apis need to be aware of the display server ( so no wayland , mir or surface flinger specific extensions ) , and getting rid of the different of the diffrence betweben android driver and desktop linux (how egl works and some extension to it ) ; is any of that going to happen now or even a thing that could have ?

snj33v commented 7 years ago

is there any progress on enabling vulkan wsi to work on wayland server side?

fooishbar commented 7 years ago

@snj33v Yes, very much so. The VK_KHXexternal* extensions (currently released as experimental extensions) are a large part of the path towards this.

GunpowderGuy commented 7 years ago

@fooishbar which of the issues i talked about could that fix ?

cubanismo commented 6 years ago

@diegor8 The Vulkan external objects extensions (external memory, external semaphores, and external fences) are now part of the Vulkan 1.1 specification. They enable the implementation of all the Vulkan WSI mechanisms/extensions on top of core Vulkan + the OS-specific parts of external objects (E.g., VK_KHR_external_memory_fd). In other words, if someone were so inclined, they could write a complete vendor-agnostic wayland compositor and wayland vulkan WSI client on top of nothing but Vulkan 1.1 APIs. VK_KHR_display support would be required to actually get such a wayland compositor to display on a monitor/output. The WSI integration could be implemented as a Vulkan implicit layer, such that its presence would be invisible to applications and look just like any other Vulkan wayland WSI implementation. Therefore, the Vulkan functionality requested when this issue was filed is complete, so I'll mark this issue closed.

My work to design and prototype new allocation mechanisms for Linux and other POSIX or POSIX-like systems (code hosted at https://github.com/cubanismo/allocator as pointed out above) is outside the scope of Vulkan and this issue, though it could prove useful to a wayland compositor making use of Vulkan for graphics and/or display. Further work by others to better integrate the Vulkan external memory mechanisms with native Linux constructs like dma-bufs and DRM format modifiers is ongoing, and much of the progress on that can be tracked publicly on the dri-devel or mesa-dev mailing lists on freedesktop.org.

fooishbar commented 6 years ago

As @cubanismo says, there are a few parts to put together. On the backend, it should be possible to use GBM as a buffer allocators as an alternative to VK_KHR_display, as an easier transition path for compositors using EGL/GBM/KMS. (Or, if running nested, you can of course continue to use the Wayland/X11 WSI swapchain extensions.)

On the compositor side, you'll want to be implementing the zwp_linux_dmabuf_v1 Wayland interface, using this new Vulkan extension to import the resulting buffers to texture from them. For Mesa-based drivers, this will allow clients using the Wayland WSI swapchain extensions to work; for the NVIDIA proprietary drivers, you would need to support the EGLStreams paths.

The latest work on modifiers is @chadversary's VK_EXT_image_drm_format_modifier, which allows advertisement and import of buffers with modifiers, similar to EGL_EXT_image_dma_buf_import_modifiers.

There is a proposed zwp_linux_explicit_synchronization_v1 Wayland extension which allows exchanging dma-fence FDs (which you could export from the client rendering signal semaphore, and import to use as a wait semaphore for compositor texturing), but the implementation for this has not yet been merged.

shmerl commented 5 years ago

For Mesa-based drivers, this will allow clients using the Wayland WSI swapchain extensions to work; for the NVIDIA proprietary drivers, you would need to support the EGLStreams paths.

Does this mean current features are not enough for compositors to avoid using EGLStreams if they want to support Nvidia? That was the major point about avoiding duplication in something like KWin.

stephen-np commented 5 years ago

@cubanismo Could you kindly give more details on how to implement a wayland compositor and WSI extension based on Vulkan 1.1? I have only vague ideas on how external objects extension can help share the buffers and do the syncs between client and compositor. Is Vulkan implicit layer something that intercepts client's Vulkan API calls?

cheako commented 4 years ago

I know nothing about most of this, but I'm not convinced there is a reason to implement anything more for a Wayland Compositor. File issues against https://bitbucket.org/cheako/vkwayland/src/master/ and explain this to me, Thanks!

cheako commented 4 years ago

@stephen-np I'm not really knowledgeable in this area, I was just able to write a prototype Wayland Compositor by sheer will power.

dmabuf(s) are driver accelerated shared memory, the data can come from GPU image buffers as well as Video Capture sources. Synchronization happens at the stream socket level, the client says something like "Here is a buffer to copy." then the server *eventually replies with "I'm done copying." This is the same as the wl_shm interface, mandatory for clients because dmabuf had limited support as of Apr 2019(when I wrote vkwayland).

As far as I know it's up to the Compositor to decide to use an intermediary buffer(what vkwayland dose) or wait till it's done drawing the frame.

KhronosGroup / Vulkan-Docs

[Feature request] Wayland compositor side extension #294