Investigate benefits of binding the mapped memory directly to other GPU

felixdoerre commented 6 years ago

To skip the memcpy, we could import mapped memory from the rendering GPU into the display GPU or the other way round, or use host memory and import in both. We need to check which of those alternatives is fastest and implementable and extend the code to use this optimization if available.

However (at least on my machine) it seems that the general transfer of the image between the GPU is the bottleneck. So maybe there is a better way then memmapping and copying?

jambonmcyeah commented 6 years ago

Maybe look into VK_EXT_external_memory_dma_buf

jambonmcyeah commented 6 years ago

Never mind nvidia doesn't implement this extension

felixdoerre commented 6 years ago

Thanks for the idea! Yes this issue is exactly for collecting/finding ideas like that. I thought of VK_EXTERNAL_MEMORY_HANDLE_TYPE_HOST_MAPPED_FOREIGN_MEMORY_BIT_EXT but I wasn't able to get a prototype running with that. It seems to be implemented but still has problems.

jambonmcyeah commented 6 years ago

Maybe VK_KHR_external_memory_fd would work. Intel and Nvidia seems to both support it

felixdoerre commented 6 years ago

I think VK_KHR_external_memory_fd is not of any use, as such a memory object can only be imported in exactly the same PhysicalDevice. (see table in https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#external-memory-handle-types-compatibility) I have the following external_memory extensions supported: Dedicated:

   VK_KHR_external_memory              : extension revision  1
   VK_KHR_external_memory_fd           : extension revision  1
Integrated VK_KHR_external_memory : extension revision 1 VK_KHR_external_memory_fd : extension revision 1 VK_EXT_external_memory_dma_buf : extension revision 1 So there seems not way at all to circumvent the memcpy.

rechapit commented 6 years ago

I'm probably reading the vulkan spec and/or the problem wrong so feel free to call me out. While it does not seem posible to for the dedicated gpu to write the frame in the integrated gpu, it does seems possible to go the other way: have the integrated GPU read from the dedicated's memory. The frame results would be written twice, but you would using DMA instead of using memcpy would should be a lot faster and less CPU intensive

felixdoerre commented 6 years ago

Yes, transferring the data via DMA is certainly better if that is possible.

However VK_EXT_external_memory_dma_buf requires:

a file descriptor for a Linux dma_buf

So I would need to acquire such a dma_buf file descriptor for the nvidia-buffer object (or for the mem-mapped region of the image). Do you have any Idea how that would be possible, if I cannot acquire it from the nvidia driver?

rechapit commented 6 years ago

Sadly no. I did not know about the dma_buf requirement.

TobiasKarnat commented 5 years ago

Maybe this helps: https://01.org/linuxgraphics/gfx-docs/drm/driver-api/dma-buf.html dma_buf_fd() https://github.com/torvalds/linux/blob/master/drivers/dma-buf/Kconfig https://github.com/torvalds/linux/blob/master/drivers/dma-buf/dma-buf.c

felixdoerre commented 5 years ago

Thanks for the resources, it's good to collect everything possibly relevant here.

As I understand int dma_buf_fd(struct dma_buf * dmabuf, int flags), this takes a struct dma_buf and turns that into a file descriptor. So this method together with the extension VK_EXT_external_memory_dma_buf of the integrated GPU I can (probably) import a struct dma_buf * info the integrated GPU. However I sill don't see any way to get either a "dma fd" or a struct dma_buf * (that dma_buf_fd could turn into an "dma fd") at all.

E.g. for nvidia-GPUs the supported extensions are mentioned here: https://developer.nvidia.com/vulkan-driver Searching for external_memory yields:

VK_KHR_external_memory_fd (not cross PhysicalDevice, only for sharing resources between processes)
VK_KHR_external_memory_win32 (not usable on linux)
VK_EXT_external_memory_host (supported by Nvidia explicitly only on Windows)

snoopcatt commented 5 years ago

Maybe ask NVidia about VK_EXT_external_memory_dma_buf? It seems they referenced as contributors here: https://github.com/KhronosGroup/Vulkan-Docs/blob/master/appendices/VK_EXT_external_memory_dma_buf.txt

TobiasKarnat commented 5 years ago

I also found this: https://devtalk.nvidia.com/default/topic/1030669/jetson-tx1/trying-to-process-with-opengl-an-eglimage-created-from-a-dmabuf_fd-/ https://docs.nvidia.com/jetson/archives/l4t-multimedia-archived/l4t-multimedia-271/group__ee__nvbuffering__group.html#gab159c94c574f75a3d7913bef8352722a

ArchangeGabriel commented 5 years ago

Just for the record, dma-buf is definitively the way to go, as I think this is what PRIME does for OpenGL. And this is likely the lowest level you can do it at, and thus with the less overrides. I don’t understand the specifics, but maybe asking @aaronp24 about that Vulkan extension you would require to do so could be a good idea.

cubanismo commented 5 years ago

I don't have much to add here, but @aaronp24 asked me to chime in, so I'll just say you've correctly surmised we don't currently support any of the extensions necessary to make a zero-copy/dma-copy-only transfer between an NV GPU and a non-NV GPU in our Linux Vulkan driver. I don't have any roadmap to share for support of any of these extensions at the moment.

felixdoerre / primus_vk

Investigate benefits of binding the mapped memory directly to other GPU #2