Open Iliolou opened 2 years ago
Can you paste the output of sudo cat /sys/module/nvidia_drm/parameters/modeset
?
~ # cat /sys/module/nvidia_drm/parameters/modeset
Y
I don't know if that matters, but I am on an Optimus laptop, however mpv with nvdec-copy works fine.
It's almost certainly caused by being an Optimus setup. Unfortunately I don't have one of those to test with.
The error with MPV is odd, the driver seems prepared to export the image (the eglExportDMABUFImageQueryMESA call returns the expected values), but then fails on the actual export. But thinking about it, the export doesn't know where it's going to be imported, so it might not be an optimus issue after all?
I've not testing the library with mplayer, so it's unlikely to work there at all. It seems to be calling vaPutImage and vaPutSurface, which aren't implemented in the library at the moment.
Based on #14 comment, mpv works great on my Optimus laptop by using --hwdec=vaapi-copy, whereas --hwdec=vaapi fails. I had to do the prime offload first. I can see a GPU Utilization of 4% and Video Engine Utilization of 4% in NVIDIA Settings while playing a low-res 1280x692 movie. Thank you!
export __NV_PRIME_RENDER_OFFLOAD=1; export __GLX_VENDOR_LIBRARY_NAME=nvidia
export LIBVA_DRIVER_NAME=nvidia
mpv --hwdec=vaapi-copy test.mp4
[vo/gpu/opengl] Initializing GPU context 'x11egl'
[vo/gpu/opengl] EGL_VERSION=1.5
[vo/gpu/opengl] EGL_VENDOR=NVIDIA
[vo/gpu/opengl] EGL_CLIENT_APIS=OpenGL_ES OpenGL
[vo/gpu/opengl] Trying to create Desktop OpenGL context.
[vo/gpu/opengl] Choosing visual EGL config 0x28, visual ID 0x20
[vo/gpu/opengl] GL_VERSION='4.4.0 NVIDIA 470.86'
[vo/gpu/opengl] Detected desktop OpenGL 4.4.
[vo/gpu/opengl] GL_VENDOR='NVIDIA Corporation'
[vo/gpu/opengl] GL_RENDERER='NVIDIA GeForce GTX 950M/PCIe/SSE2'
[vo/gpu/opengl] GL_SHADING_LANGUAGE_VERSION='4.40 NVIDIA via Cg compiler'
[vo/gpu] Using FBO format rgba16f
[vd] Codec list:
[vd] h264 - H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10
[vd] h264_v4l2m2m (h264) - V4L2 mem2mem H.264 decoder wrapper
[vd] libopenh264 (h264) - OpenH264 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10
[vd] h264_cuvid (h264) - Nvidia CUVID H264 decoder
[vd] Opening decoder h264
[vd] Looking at hwdec h264-vaapi-copy...
[vaapi] Initialized VAAPI: version 1.12
[vd] Trying hardware decoding via h264-vaapi-copy.
[vd] Selected codec: h264 (H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10)
[vd] Pixel formats supported by decoder: vdpau cuda vaapi_vld yuv420p
[vd] Codec profile: High (0x64)
[vd] Requesting pixfmt 'vaapi_vld' from decoder.
**[vd] Using hardware decoding (vaapi-copy).**
[vd] Decoder format: 1280x692 nv12 auto/auto/auto/auto/auto CL=mpeg2/4/h264
[vf] [in] 1280x692 [1803737147:1876720717] nv12 bt.709/bt.709/bt.1886/limited/display SP=1.000000 CL=mpeg2/4/h264
[vf] [out[cplayer] VO: [gpu] 1280x692 => 1280x719 nv12
[cplayer] VO: Description: Shader-based GPU Renderer
[vo/gpu] reconfig to 1280x692 [1803737147:1876720717] nv12 bt.709/bt.709/bt.1886/limited/display SP=1.000000 CL=mpeg2/4/h264
[statusline] AV: 00:00:03 / 01:36:44 (0%) A-V: 0.000
FWIW, I don't have any great theories here after reading through #33. If this really is some Optimus/PRIME/render-offload interaction issue, I don't know why it would fail here, as @elFarto is correct that the exporter has no idea up-front who will be importing the buffer and hence should happily export it. If there were to be cross-GPU sharing issues, I'd expect them to occur during import on the non-NV GPU instead.
I've only tested the Optimus/PRIME setup with my Geforce 760 on the 470 drivers, and that fails when the Intel driver tries to import it. I assume that's some limitation with that series of drivers, as it works fine if it's the NVIDIA driver that imports it.
I do need to pull my 1060 out and put it in that machine to test it with the newer drivers, but that's a pain :smile:
The import-side failure won't be fixed in newer drivers either. The memory constraints work I've been involved with for years now would be needed for that.
The problem is the EGLImage memory is most likely going to be in GPU-local memory (vidmem), which the Intel driver won't be able to map into its GPU. There needs to be some way to access that GPU-local memory directly from 3rd-party devices (This is technically possible, but not with current driver code, and often isn't as optimal as it might seem anyway), or negotiate up front for a buffer in some more mutually agreeable location (System memory, generally). The latter is where memory constraint APIs would come in. Upstream drivers built on top of TTM solve this automatically when possible by dynamically migrating the memory to that shared location on import (roughly speaking), but our driver architecture doesn't allow this, and as mentioned, that's often times not actually the most optimal config. When dma-buf is used internally by higher-level APIs (Like PRIME render offload through Vulkan/GLX/EGL+X11), we can detect this case and internally insert a vidmem->sysmem blit or vice-versa, internally placing the dma-buf memory in sysmem, but with direct buffer access, there's no point in the API to accomplish this cleanly. This was another one of those things where the EGLStreams-based sharing model made things easier on drivers by providing a slightly higher-level abstraction, but eventually we'll have the right tools to expose the same level of functionality while still providing the lower-level access dma-buf-based sharing provides.
That's good to know. Sounds like something resizable PCIe BAR would help with, since that would technically make all of GPU memory available, or is it a limitation on the Intel side?
Is there anything I can do in CUDA to get the buffer/CUarray placed in the correct location before exporting it?
Is there anything I can do in CUDA to get the buffer/CUarray placed in the correct location before exporting it?
I'm largely unfamiliar with the CUDA API, but I would imagine it would be hard to do with a surface that came from NVDEC. I'd be digging through the same CUDA manuals you would be trying to answer that.
where you can connect the consumer and specify which memory the consumer wants the EGL image to be in. If you were going to consume on the Intel iGPU side, then you'd want to use sysmem here, but what's the actual use-case. Even on an optimus laptop, if you are using this driver in the first place, you'd want to be using GL/Vulkan on the dGPU as well, or why bother? Use vaapi on the iGPU to go with Vulkan/GL there too.
It's theoretically interesting but I don't think it's solving a real problem. In fact, in the original reported problem, is it using mismatched iGPU vs dGPU? It should be all one way or the other.
Yeah, I had assumed someone was using an NV dGPU to decode, then sending the output to an iGPU for presentation via GL or some other mechanism. If you're using GL is a simple blitter, or sending the dmabuf straight to some display hardware for presentation, (Directly, or after forwarding to wayland/X11 via some socket protocol, or some other framework/API), that probably makes sense. However, if you're doing some non-trivial processing in GL or Vulkan, it would generally make sense to be doing that on the dGPU as well.
Separately, if the user really is attempting to do all this on the dGPU, I'm not sure why importing would fail just because there's also an iGPU in the system.
@Iliolou Can you retest using the latest version, v0.0.3?
Latest version 0.0.4 works fine with no need to NV_PRIME_RENDER_OFFLOAD anymore on Optimus. To clarify: X11 runs on Intel not on Nvidia. mpv (even an old 0.32 version) works as good as with nvdec.
export LIBVA_DRIVER_NAME=nvidia mpv --hwdec=vaapi-copy test.mp4 - Success mpv --hwdec=vaapi test.mp4 - Failed to find VA X11 surface.
libva info: VA-API version 1.12.0
libva info: User environment variable requested driver 'nvidia'
libva info: Trying to open /usr/lib64/va/drivers/nvidia_drv_video.so
libva info: Found init function __vaDriverInit_1_0
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.12 (libva 2.12.0)
vainfo: Driver version: VA-API NVDEC driver
vainfo: Supported profile and entrypoints
VAProfileMPEG2Simple : VAEntrypointVLD
VAProfileMPEG2Main : VAEntrypointVLD
VAProfileVC1Simple : VAEntrypointVLD
VAProfileVC1Main : VAEntrypointVLD
VAProfileVC1Advanced : VAEntrypointVLD
<unknown profile> : VAEntrypointVLD
VAProfileH264Main : VAEntrypointVLD
VAProfileH264High : VAEntrypointVLD
VAProfileH264ConstrainedBaseline: VAEntrypointVLD
It is a rather old GeForce GTX 950M. No HEVC or AV1.
Interop will only work if the application is using the nvidia gpu for X11. That generally means using:
__NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia
as per nvidia docs.
But if you're letting Intel drive X11, you'd be better off letting Intel handle the vaapi video decode as well.
--hwdec=vaapi-copy
works on my own Optimus laptop too, also without PRIME, but --hwdec=vaapi
with PRIME gives me a green screen and without PRIME results in a kernel oops that makes X/display unusable until reboot :sweat_smile:
That's on driver version 510.47.03 and GTX 1660 Ti Mobile, while X11 is being driven by integrated AMD Radeon Vega 10.
I spent several hours looking into this the other day, and I've come to the conclusion that Firefox can't support running on the non-default GPU (in X11 atleast). The gfxtest process that runs, has slightly different logic to create EGL contexts that prevents it from creating a context on the NVIDIA GPU. The process we've been trying to use just ends up resulting in there being no EGL drivers to choose from.
I did try some slight modifications to Firefox to make it work, but there look to be multiple places where changes would need to be made to get it working. Ideally we'd skip that and just make video decoding working on NVIDIA, with the rest of the rendering done by the Intel chip, but that's a far harder task.
Could you please elaborate what would be "non-default" and how can I change that? I'm fine with running nvidia all the time instead of the intel's one
Thank you for your effort. Compiles fine on Gentoo, but can't run. I am on Skylake with GeForce GTX 950M. mpv with nvdec works fine. Here are the logs: