VirtualGL / virtualgl

Main VirtualGL repository
https://VirtualGL.org
Other
701 stars 106 forks source link

Errors with Virtualgl with Amdgpu proprietary drivers #221

Closed evan-barentin closed 1 year ago

evan-barentin commented 1 year ago

Hi,

After so many years using nvidia graphic cards, we, at Anatoscope, are considering the possibility of using V520 amd card on AWS ; but we run into some trouble using virtualgl v 3.0.2 with amd proprietary drivers and opengl implementation: a simple vglrun glxinfo command is pending and does not show anything. Here is the few lines of the call stack displayed when adding +v +ts to vglrun:

vglrun +v +tr glxinfo
[VGL] Shared memory segment ID for vglconfig: 4358274
[VGL] VirtualGL v3.0.2 64-bit (Build 20221020)
[VGL 0xd6add3c0] XOpenDisplay (name=NULL [VGL] dlopen (filename=libX11-xcb.so.1 flag=258 retval=0x01e89270)
[VGL] dlopen (filename=libxcb.so.1 flag=258 retval=0x7f54d6ae1990)
[VGL] dlopen (filename=libxshmfence.so.1 flag=258 retval=0x01e89940)
[VGL] dlopen (filename=libxcb-dri3.so.0 flag=258 retval=0x01e89ef0)
[VGL] dlopen (filename=libxcb-dri2.so.0 flag=258 retval=0x01e8a4b0)
[VGL] dlopen (filename=libxcb-randr.so.0 flag=258 retval=0x01e8aa80)
[VGL] dlopen (filename=libxcb-sync.so.1 flag=258 retval=0x01e8b050)
[VGL] dlopen (filename=libX11.so.6 flag=258[VGL] NOTICE: Replacing dlopen("libX11.so.6") with dlopen("libvglfaker.so")
 retval=0x7f54d6ae24e0)
[VGL] dlopen (filename=libxcb-present.so.0 flag=258 retval=0x01e8b730)
[VGL] dlopen (filename=libxcb-glx.so.0 flag=258 retval=0x01e8bd00)
[VGL] dlopen (filename=libXfixes.so.3 flag=258 retval=0x01e8c2c0)
[VGL] dlopen (filename=libXdamage.so.1 flag=258 retval=0x01e8c8b0)
[VGL] dlopen (filename=libXext.so.6 flag=258 retval=0x7f54d6ae09d0)
[VGL] dlopen (filename=libXxf86vm.so.1 flag=258 retval=0x01e8d0e0)
[VGL] dlopen (filename=libXau.so.6 flag=258 retval=0x7f54d6ade650)
[VGL] dlopen (filename=libXdmcp.so.6 flag=258 retval=0x01e8d710)

Without vglrun, glxinfo | grep OpenGL works fine:

OpenGL vendor string: Advanced Micro Devices, Inc.
OpenGL renderer string: AMD Radeon Pro V520  MxGPU
OpenGL core profile version string: 4.4.14736 Core Profile Context 0.3.1171946.el7
OpenGL core profile shading language version string: 4.60
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 4.6.14736 Compatibility Profile Context 0.3.1171946.el7
OpenGL shading language version string: 4.60
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile
OpenGL extensions:
OpenGL ES profile version string: 4.6.14736 Compatibility Profile Context 0.3.1171946.el7
OpenGL ES profile shading language version string: 4.60

This test was done on a g4ad.4xlarge instance with Amazon Linux AMI with pre-installed amd drivers. In this instance, we started a simple xorg server running xterm via xinit on :0 and streaming it with x11vnc.

We encounter the same behaviour on an Ubuntu 22.04 AMI with the latest proprietary drivers installed, on the same aws instance and with the same protocol.

Have a nice day,

dcommander commented 1 year ago

Unfortunately, there are some known conformance issues with AMD's drivers. VirtualGL even includes an undocumented environment variable (VGL_AMDGPUHACK) that, when set to 1, works around the conformance issues in the non-proprietary AMDGPU driver enough that fakerut will complete successfully with those drivers when using the GLX back end. (I haven't observed any issues with the EGL back end.) I unfortunately cannot test a recent version of the PRO drivers because they don't support my GPU (a FirePro that is only a few years old.) I am happy to log in remotely to your AWS instance and attempt to diagnose the problem, but that would be a paid service.

dcommander commented 1 year ago

(and not a service that I could perform until after the new year.)

dcommander commented 1 year ago

I was actually able to upgrade to the latest AMDGPU release on my Radeon Pro WX2100, even though AMD's web site told me otherwise. I can't reproduce any failures. Contact me through e-mail if you would like to pursue this through paid support, since the issue appears to be system-specific.

dcommander commented 1 year ago

Another thing you might also try is setting VGL_PROBEGLX=0 in the environment. Sometimes that works around issues with certain OpenGL implementations that don't behave nicely with other OpenGL implementations.

marc-legendre commented 1 year ago

Thank you for your replies, and sorry we didn't get back do you! We set aside our AMD experiments for now, but we've duly noted that you can provide paid support if need be.