Open felixdoerre opened 6 years ago
Maybe look into VK_EXT_external_memory_dma_buf
Never mind nvidia doesn't implement this extension
Thanks for the idea! Yes this issue is exactly for collecting/finding ideas like that. I thought of VK_EXTERNAL_MEMORY_HANDLE_TYPE_HOST_MAPPED_FOREIGN_MEMORY_BIT_EXT
but I wasn't able to get a prototype running with that. It seems to be implemented but still has problems.
Maybe VK_KHR_external_memory_fd
would work. Intel and Nvidia seems to both support it
I think VK_KHR_external_memory_fd
is not of any use, as such a memory object can only be imported in exactly the same PhysicalDevice. (see table in https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#external-memory-handle-types-compatibility)
I have the following external_memory extensions supported:
Dedicated:
VK_KHR_external_memory : extension revision 1 VK_KHR_external_memory_fd : extension revision 1
Integrated VK_KHR_external_memory : extension revision 1 VK_KHR_external_memory_fd : extension revision 1 VK_EXT_external_memory_dma_buf : extension revision 1 So there seems not way at all to circumvent the memcpy.
I'm probably reading the vulkan spec and/or the problem wrong so feel free to call me out. While it does not seem posible to for the dedicated gpu to write the frame in the integrated gpu, it does seems possible to go the other way: have the integrated GPU read from the dedicated's memory. The frame results would be written twice, but you would using DMA instead of using memcpy would should be a lot faster and less CPU intensive
Yes, transferring the data via DMA is certainly better if that is possible.
However VK_EXT_external_memory_dma_buf
requires:
a file descriptor for a Linux dma_buf
So I would need to acquire such a dma_buf file descriptor for the nvidia-buffer object (or for the mem-mapped region of the image). Do you have any Idea how that would be possible, if I cannot acquire it from the nvidia driver?
Sadly no. I did not know about the dma_buf requirement.
Thanks for the resources, it's good to collect everything possibly relevant here.
As I understand int dma_buf_fd(struct dma_buf * dmabuf, int flags)
, this takes a struct dma_buf
and turns that into a file descriptor. So this method together with the extension VK_EXT_external_memory_dma_buf
of the integrated GPU I can (probably) import a struct dma_buf *
info the integrated GPU. However I sill don't see any way to get either a "dma fd" or a struct dma_buf *
(that dma_buf_fd
could turn into an "dma fd") at all.
E.g. for nvidia-GPUs the supported extensions are mentioned here: https://developer.nvidia.com/vulkan-driver Searching for external_memory
yields:
VK_KHR_external_memory_fd
(not cross PhysicalDevice, only for sharing resources between processes)VK_KHR_external_memory_win32
(not usable on linux)VK_EXT_external_memory_host
(supported by Nvidia explicitly only on Windows)Maybe ask NVidia about VK_EXT_external_memory_dma_buf
?
It seems they referenced as contributors here: https://github.com/KhronosGroup/Vulkan-Docs/blob/master/appendices/VK_EXT_external_memory_dma_buf.txt
I also found this: https://devtalk.nvidia.com/default/topic/1030669/jetson-tx1/trying-to-process-with-opengl-an-eglimage-created-from-a-dmabuf_fd-/ https://docs.nvidia.com/jetson/archives/l4t-multimedia-archived/l4t-multimedia-271/group__ee__nvbuffering__group.html#gab159c94c574f75a3d7913bef8352722a
Just for the record, dma-buf is definitively the way to go, as I think this is what PRIME does for OpenGL. And this is likely the lowest level you can do it at, and thus with the less overrides. I don’t understand the specifics, but maybe asking @aaronp24 about that Vulkan extension you would require to do so could be a good idea.
I don't have much to add here, but @aaronp24 asked me to chime in, so I'll just say you've correctly surmised we don't currently support any of the extensions necessary to make a zero-copy/dma-copy-only transfer between an NV GPU and a non-NV GPU in our Linux Vulkan driver. I don't have any roadmap to share for support of any of these extensions at the moment.
To skip the memcpy, we could import mapped memory from the rendering GPU into the display GPU or the other way round, or use host memory and import in both. We need to check which of those alternatives is fastest and implementable and extend the code to use this optimization if available.
However (at least on my machine) it seems that the general transfer of the image between the GPU is the bottleneck. So maybe there is a better way then memmapping and copying?