OpenGL isn't accelerated by default with Nvidia drivers

CendioOssman commented 5 months ago

Describe the bug If I run an OpenGL application with an Nvidia card for acceleration, then the application will still be unaccelerated. This can also be seen using glxinfo which shows llvmpipe

To Reproduce Steps to reproduce the behavior:

Run glxinfo
See OpenGL renderer string: llvmpipe (LLVM 18.1.6, 256 bits)

Expected behavior See OpenGL renderer string: NVIDIA GeForce RTX 4060/PCIe/SSE2.

Client (please complete the following information): No client needed.

Server (please complete the following information):

OS: Fedora 40
VNC server: TigerVNC
VNC server version: 1.14.0 beta
Server downloaded from: Built using contrib spec file
Server was started using: Xvnc :2

Additional context The issue seems to be that it still tries the mesa drivers. I get this error during startup of the application:

glx: failed to create dri3 screen
failed to load driver: nouveau
DRM kernel driver 'nvidia-drm' in use. NVK requires nouveau.
glx: failed to create dri3 screen
failed to load driver: nouveau

If I force glvnd to pick the Nvidia driver, then everything works:

$ __GLX_VENDOR_LIBRARY_NAME=nvidia DISPLAY=:2 glxinfo | grep renderer
OpenGL renderer string: NVIDIA GeForce RTX 4060/PCIe/SSE2

No such issue appears for Vulkan. I guess there is a better mechanism there.

CendioOssman commented 5 months ago

I'm guessing the problem is that Xvnc is still using Mesa for indirect rendering, and hence it will say mesa when you query the GLX extension for which driver to use. Perhaps we can override this from the DRI3 code somehow when we see that we load the Nvidia driver?

dcommander commented 4 months ago

Does DRI3 even work with the nVidia proprietary drivers? My understanding was that they use DRI2, which is tied to the physical X server (via the NV-GLX extension) and can't work in a virtual X server such as Xvnc, Xvfb, etc.

CendioOssman commented 4 months ago

Works nicely during our testing here. I'm afraid I'm not at the machine presently, so I can't check which Nvidia driver version. I think it's the latest, though.

dcommander commented 4 months ago

I'm using the latest version (550.xx), and it definitely doesn't work for me. When using nVidia's DRM render node with DRI3, the __GLX_VENDOR_LIBRARY_NAME=nvidia trick you posted above allows glxinfo to work, but an actual application (GLXspheres, GLXgears, etc.) segfaults in the body of glXSwapBuffers(). My understanding from nVidia's developers is that their drivers don't support DRI3 and that they only support GBM with Wayland, so I'm trying to figure out whether that has somehow changed since I last spoke to them. Whether or not DRI3/Xvnc can be made to work with nVidia's professional GPUs has obvious ramifications for the long-term viability of VirtualGL, which has obvious ramifications for my ability to continue as an independent open source developer (which would have ramifications for the health of libjpeg-turbo as well.)

dcommander commented 4 months ago

As far as Vulkan, nVidia's Vulkan implementation does indeed use a different mechanism. (It doesn't use DRI3, to the best of my understanding.) If it detects that it is running in an X proxy environment, it does something very VirtualGL-like, automatically redirecting rendering into a GPU-based buffer and using a swap chain that reads back the GPU-based buffer and recomposites the pixels into the X window. The drawback is that nVidia's implementation doesn't currently allow you to select a specific nVidia GPU if you have more than one nVidia GPU in the system.

dcommander commented 4 months ago

@CendioOssman Is a minimum kernel version required in order to use GBM with nVidia's drivers? That could explain why I am unable to reproduce your results.

dcommander commented 4 months ago

OK, I guess at this point I will not receive a timely answer. I attempted, to the best of my knowledge and experience, to peer-review your claim that the nVidia proprietary driver works with DRI3. I could not confirm your claim and do not, in fact, see how that driver could ever work with DRI3, given that it does not contain a Mesa-compatible DRI driver module. What I observe is that:

If __GLX_VENDOR_LIBRARY_NAME is unset before starting the session, then
- I get the aforementioned nouveau error when the DRI3 extension initializes.
- The window manager appears in the session, as expected.
- nvidia-smi does not show that Xvnc or the window manager are using the GPU at all.
- If I run __GLX_VENDOR_LIBRARY_NAME=nvidia glxinfo from a terminal in the session, then glxinfo behaves as expected. If I pause the output of glxinfo, then nvidia-smi shows that glxinfo is using the GPU.
- If I run __GLX_VENDOR_LIBRARY_NAME=nvidia glxspheres64 or __GLX_VENDOR_LIBRARY_NAME=nvidia glxgears, then the application segfaults in the body of glXSwapBuffers().
If __GLX_VENDOR_LIBRARY_NAME=nvidia is set before starting the VNC session, then
- I get the aforementioned nouveau error when the DRI3 extension initializes.
- gbm_bo_map() returns EACCES every time it is called.
- I get a blank screen in the session.
- If I log in via SSH and set DISPLAY to point to the session, then the glxinfo, glxspheres64, and glxgears behavior is as described above.

At this point, I am forced to conclude that either your claims are erroneous or that they are only true for very limited configurations. It would be nice if you could provide clarifying information, including insight into how Cendio tests this feature, so the open source community can better understand those limitations. At the moment, you claim that the feature always works with "newer nVidia drivers", which is clearly not true.

clbr commented 4 months ago

Our (Kasm) experience is the same as dcommander's, however we found a possible approach in running Zink on top of the Nvidia proprietary vulkan drivers. That would combine the working Nvidia Vulkan with the Zink GL-on-vulkan emulation layer, in theory giving a good solution. Not sure if anyone tried it yet though.

dcommander commented 4 months ago

That really is reinventing VirtualGL's wheel, though, since nVidia's Vulkan implementation basically does for Vulkan what VGL does for OpenGL. I've often wondered whether VirtualGL could be reimplemented using some modern interfaces like GLVND and/or DRI that weren't available when it was first released. Of course, that would be a huge effort for which there is no funding.

CendioOssman commented 4 months ago

I have not had an opportunity to do any deep digging. All I can say is that it works fine here on Fedora 40, with a RTX 4060 and Nvidia's 555 driver. No idea right now why you are seeing such problems.

The issues you mention seem to be entirely different things that what this issue report is about. Those sound like driver bugs, whilst this here is more likely a deficiency in Xorg or TigerVNC. As such, this is not the appropriate place for that discussion. I'll go ahead and hide these comments as off-topic.

CendioOssman commented 3 months ago

I've been able to come back to testing here, and superficially everything looks like it is working on Fedora 40 with RTX 4060 and Nvidia driver 555.58.02 installed from RPMFusion.

But looking closer at it, rendering performance is not very good, and CPU usage is high. I think the Nvidia driver is falling back to software rendering, even if it presents everything the same as when things as GPU accelerated.

So there does indeed seem to be things that need to be fixed on Nvidia's side, and solving the GLVND issue here will not be sufficient.

dcommander commented 3 months ago

That is consistent with my understanding. DRI3 allocates GPU buffers in the OpenGL front end (i.e. in the application), whereas DRI2 (what nVidia's drivers currently use) allocates GPU buffers in the OpenGL back end (i.e. in the X server.) Thus, they would need to change their driver architecture and release a Mesa-compatible DRI driver in order to support DRI3. By manipulating __GLX_VENDOR_LIBRARY_NAME, you were able to use nVidia's OpenGL front end, but the OpenGL back end was probably still llvmpipe. (nvidia-smi is a good tool to verify whether an application is actually using an nVidia GPU.)

TigerVNC / tigervnc

OpenGL isn't accelerated by default with Nvidia drivers #1773