Open CendioOssman opened 5 months ago
I'm guessing the problem is that Xvnc is still using Mesa for indirect rendering, and hence it will say mesa
when you query the GLX extension for which driver to use. Perhaps we can override this from the DRI3 code somehow when we see that we load the Nvidia driver?
Does DRI3 even work with the nVidia proprietary drivers? My understanding was that they use DRI2, which is tied to the physical X server (via the NV-GLX extension) and can't work in a virtual X server such as Xvnc, Xvfb, etc.
Works nicely during our testing here. I'm afraid I'm not at the machine presently, so I can't check which Nvidia driver version. I think it's the latest, though.
I'm using the latest version (550.xx), and it definitely doesn't work for me. When using nVidia's DRM render node with DRI3, the __GLX_VENDOR_LIBRARY_NAME=nvidia
trick you posted above allows glxinfo
to work, but an actual application (GLXspheres, GLXgears, etc.) segfaults in the body of glXSwapBuffers()
. My understanding from nVidia's developers is that their drivers don't support DRI3 and that they only support GBM with Wayland, so I'm trying to figure out whether that has somehow changed since I last spoke to them. Whether or not DRI3/Xvnc can be made to work with nVidia's professional GPUs has obvious ramifications for the long-term viability of VirtualGL, which has obvious ramifications for my ability to continue as an independent open source developer (which would have ramifications for the health of libjpeg-turbo as well.)
As far as Vulkan, nVidia's Vulkan implementation does indeed use a different mechanism. (It doesn't use DRI3, to the best of my understanding.) If it detects that it is running in an X proxy environment, it does something very VirtualGL-like, automatically redirecting rendering into a GPU-based buffer and using a swap chain that reads back the GPU-based buffer and recomposites the pixels into the X window. The drawback is that nVidia's implementation doesn't currently allow you to select a specific nVidia GPU if you have more than one nVidia GPU in the system.
@CendioOssman Is a minimum kernel version required in order to use GBM with nVidia's drivers? That could explain why I am unable to reproduce your results.
OK, I guess at this point I will not receive a timely answer. I attempted, to the best of my knowledge and experience, to peer-review your claim that the nVidia proprietary driver works with DRI3. I could not confirm your claim and do not, in fact, see how that driver could ever work with DRI3, given that it does not contain a Mesa-compatible DRI driver module. What I observe is that:
__GLX_VENDOR_LIBRARY_NAME
is unset before starting the session, then
nvidia-smi
does not show that Xvnc or the window manager are using the GPU at all.__GLX_VENDOR_LIBRARY_NAME=nvidia glxinfo
from a terminal in the session, then glxinfo
behaves as expected. If I pause the output of glxinfo
, then nvidia-smi
shows that glxinfo
is using the GPU.__GLX_VENDOR_LIBRARY_NAME=nvidia glxspheres64
or __GLX_VENDOR_LIBRARY_NAME=nvidia glxgears
, then the application segfaults in the body of glXSwapBuffers()
.__GLX_VENDOR_LIBRARY_NAME=nvidia
is set before starting the VNC session, then
gbm_bo_map()
returns EACCES
every time it is called.DISPLAY
to point to the session, then the glxinfo
, glxspheres64
, and glxgears
behavior is as described above.At this point, I am forced to conclude that either your claims are erroneous or that they are only true for very limited configurations. It would be nice if you could provide clarifying information, including insight into how Cendio tests this feature, so the open source community can better understand those limitations. At the moment, you claim that the feature always works with "newer nVidia drivers", which is clearly not true.
Our (Kasm) experience is the same as dcommander's, however we found a possible approach in running Zink on top of the Nvidia proprietary vulkan drivers. That would combine the working Nvidia Vulkan with the Zink GL-on-vulkan emulation layer, in theory giving a good solution. Not sure if anyone tried it yet though.
That really is reinventing VirtualGL's wheel, though, since nVidia's Vulkan implementation basically does for Vulkan what VGL does for OpenGL. I've often wondered whether VirtualGL could be reimplemented using some modern interfaces like GLVND and/or DRI that weren't available when it was first released. Of course, that would be a huge effort for which there is no funding.
I have not had an opportunity to do any deep digging. All I can say is that it works fine here on Fedora 40, with a RTX 4060 and Nvidia's 555 driver. No idea right now why you are seeing such problems.
The issues you mention seem to be entirely different things that what this issue report is about. Those sound like driver bugs, whilst this here is more likely a deficiency in Xorg or TigerVNC. As such, this is not the appropriate place for that discussion. I'll go ahead and hide these comments as off-topic.
I've been able to come back to testing here, and superficially everything looks like it is working on Fedora 40 with RTX 4060 and Nvidia driver 555.58.02 installed from RPMFusion.
But looking closer at it, rendering performance is not very good, and CPU usage is high. I think the Nvidia driver is falling back to software rendering, even if it presents everything the same as when things as GPU accelerated.
So there does indeed seem to be things that need to be fixed on Nvidia's side, and solving the GLVND issue here will not be sufficient.
That is consistent with my understanding. DRI3 allocates GPU buffers in the OpenGL front end (i.e. in the application), whereas DRI2 (what nVidia's drivers currently use) allocates GPU buffers in the OpenGL back end (i.e. in the X server.) Thus, they would need to change their driver architecture and release a Mesa-compatible DRI driver in order to support DRI3. By manipulating __GLX_VENDOR_LIBRARY_NAME
, you were able to use nVidia's OpenGL front end, but the OpenGL back end was probably still llvmpipe. (nvidia-smi
is a good tool to verify whether an application is actually using an nVidia GPU.)
Describe the bug If I run an OpenGL application with an Nvidia card for acceleration, then the application will still be unaccelerated. This can also be seen using
glxinfo
which showsllvmpipe
To Reproduce Steps to reproduce the behavior:
glxinfo
OpenGL renderer string: llvmpipe (LLVM 18.1.6, 256 bits)
Expected behavior See
OpenGL renderer string: NVIDIA GeForce RTX 4060/PCIe/SSE2
.Client (please complete the following information): No client needed.
Server (please complete the following information):
Xvnc :2
Additional context The issue seems to be that it still tries the mesa drivers. I get this error during startup of the application:
If I force glvnd to pick the Nvidia driver, then everything works:
No such issue appears for Vulkan. I guess there is a better mechanism there.