Virtualgl with xpra rendering very slow and lagging

zenny commented 8 years ago

Hi,

I am posting here after a gentlmanly xpra developer (Mr. Antoine Martin) pointed me to this. I have opened an issue with them at https://www.xpra.org/trac/ticket/1042

I have explained everything in the above ticke, fyi.

Thanks for the nifty virtual library.

Cheers, /z

dcommander commented 8 years ago

Within an xpra environment (or any other X11 proxy, such as the one we provide-- TurboVNC-- or NX or whatnot), VirtualGL isn't really doing much, other than redirecting the OpenGL and GLX calls. Apart from that, it's just reading back the framebuffer and drawing the frames into the X proxy using XShmPutImage(). There are only really a few reasons why it would be slow:

Slow glReadPixels() (out of our control-- depends on the GPU and/or driver)
Slow 3D rendering (out of our control-- the application controls that)
Slow compression (out of our control-- xpra controls that)
Slow network (also out of our control)

I would suggest running with vglrun +pr to enable profiling output. That may tell you whether your GPU and/or drivers have a slow glReadPixels() implementation. We mainly support nVidia and AMD GPUs. Intel GPUs are known to also work decently (but still a lot slower than even a cheap GeForce.) A lot of lower-end GPUs have slow readback and aren't suitable for VirtualGL. And, for instance, with an nVidia GPU you need to be using the proprietary drivers and not nouveau.

Also, you should never use -c jpeg with xpra. xpra is doing the compression for you, so specifying -c jpeg would be redundant and would add a lot of unnecessary CPU overhead (which would definitely slow things down.) Always use -c proxy or -c 0 with xpra (but VirtualGL should detect that it's an X proxy and select proxy mode by default, so you shouldn't have to specify that.)

zenny commented 8 years ago

@dcommander , thank you for your very useful explanation. 2-4 may not be my case. Could be 1, but not sure. running vglrun +pr APP didn't give any ouput rleated to glReadPixels(). And I have ATI cards.

dcommander commented 8 years ago

vglrun +pr should show lines that say "readback". That is the glReadPixels() performance. Refer to https://cdn.rawgit.com/VirtualGL/virtualgl/2.5beta1/doc/index.html#hd0017

zenny commented 8 years ago

glReadPixels() seems enabled. So previous §1 seems to be a non-issue.

$ vglrun +pr glxgears Readback - 126.31 Mpixels/sec- 126.55 fps Blit - 151.23 Mpixels/sec- 151.53 fps Total - 59.51 Mpixels/sec- 59.63 fps Readback - 128.42 Mpixels/sec- 128.67 fps Blit - 152.69 Mpixels/sec- 152.99 fps Total - 58.99 Mpixels/sec- 59.11 fps

Thanks.

On 11/24/15, DRC notifications@github.com wrote:

vglrun +pr should show lines that say "readback". That is the glReadPixels() performance. Refer to https://cdn.rawgit.com/VirtualGL/virtualgl/2.5beta1/doc/index.html#hd0017

Reply to this email directly or view it on GitHub: https://github.com/VirtualGL/virtualgl/issues/11#issuecomment-159381502

dcommander commented 8 years ago

Do not use GLXgears. It is not a realistic benchmark. Try the following:

vglrun -sp +pr /opt/VirtualGL/bin/glxspheres64

dcommander commented 8 years ago

Also, another way to verify whether it's VirtualGL's fault is to simply install TurboVNC on the same server. If it's fast in TurboVNC but slow in xpra, then that's a clear indication that the lag is xpra's fault.

zenny commented 8 years ago

On 12/2/15, DRC notifications@github.com wrote:

Also, another way to verify whether it's VirtualGL's fault is to simply install TurboVNC on the same server. If it's fast in TurboVNC but slow in xpra, then that's a clear indication that the lag is xpra's fault.

Thanks again.

However, I tried to setup TurboVNC (downloaded .deb binary from sf.net) as stated in https://www.digitalocean.com/community/tutorials/how-to-install-and-configure-vnc-on-ubuntu-14-04.

However, /opt/TurboVNC/bin/vncviewer fails to connect to the running instance of X server in :2. :-(

Is TurboVNC too complicated? Just wondering! (sorry this is not directly associated with virtualgl, yet TurboVNC as far as I read is an VirtualGL project. ;-)

Reply to this email directly or view it on GitHub: https://github.com/VirtualGL/virtualgl/issues/11#issuecomment-161181948

zenny commented 8 years ago

@dcommander Here is the output of:

Do not use GLXgears. It is not a realistic benchmark. Try the following: vglrun -sp +pr /opt/VirtualGL/bin/glxspheres64

$ vglrun -sp +pr /opt/VirtualGL/bin/glxspheres64 
Polygons in scene: 62464 (61 spheres * 1024 polys/spheres)
Visual ID of window: 0x21
Context is Direct
OpenGL Renderer: Gallium 0.4 on AMD CAICOS
Readback    -  113.21 Mpixels/sec-  170.50 fps
Blit        -  177.41 Mpixels/sec-  267.18 fps
Total       -   39.34 Mpixels/sec-   59.26 fps
59.149395 frames/sec - 39.274252 Mpixels/sec
Blit        -  180.80 Mpixels/sec-  272.30 fps
Total       -   39.50 Mpixels/sec-   59.48 fps
Readback    -  111.86 Mpixels/sec-  168.46 fps
59.532475 frames/sec - 39.528611 Mpixels/sec
Blit        -  168.03 Mpixels/sec-  253.06 fps
Total       -   39.49 Mpixels/sec-   59.48 fps
Readback    -  115.28 Mpixels/sec-  173.63 fps
59.579317 frames/sec - 39.559713 Mpixels/sec
Blit        -  174.66 Mpixels/sec-  263.05 fps
Total       -   39.51 Mpixels/sec-   59.51 fps
Readback    -  114.03 Mpixels/sec-  171.74 fps
59.443877 frames/sec - 39.469783 Mpixels/sec
Blit        -  169.85 Mpixels/sec-  255.81 fps
Total       -   39.21 Mpixels/sec-   59.05 fps
Readback    -  114.62 Mpixels/sec-  172.62 fps
59.079249 frames/sec - 39.227676 Mpixels/sec
Blit        -  176.29 Mpixels/sec-  265.51 fps
Total       -   39.85 Mpixels/sec-   60.02 fps
Readback    -  115.91 Mpixels/sec-  174.56 fps
59.895420 frames/sec - 39.769601 Mpixels/sec
Readback    -  114.04 Mpixels/sec-  171.75 fps
59.444351 frames/sec - 39.470098 Mpixels/sec
Blit        -  174.67 Mpixels/sec-  263.06 fps
Total       -   39.38 Mpixels/sec-   59.31 fps
Readback    -  115.10 Mpixels/sec-  173.35 fps
60.034090 frames/sec - 39.861675 Mpixels/sec
Blit        -  174.28 Mpixels/sec-  262.48 fps
Total       -   39.83 Mpixels/sec-   59.99 fps

dcommander commented 8 years ago

Is TurboVNC too complicated? No, you're just not reading the right set of instructions. All you have to do is install the DEB, run /opt/TurboVNC/bin/vncserver, and connect using /opt/TurboVNC/bin/vncviewer {server hostname}:{display number} (or /opt/TurboVNC/bin/vncviewer -tunnel {server hostname}:{display number} if you want to use SSH tunneling.) This is all documented in our User's Guide.

As far as the performance, the profiling output you list above indicates that the bottleneck is not in VirtualGL but in xpra. However, 40 Mpixels/sec is still a reasonable level of performance-- I would not call that "very slow." Regardless, I believe that whatever issue you're having is not VirtualGL's fault.

zenny commented 8 years ago

As far as the performance, the profiling output you list above indicates that the bottleneck is not in VirtualGL but in xpra. However, 40 Mpixels/sec is still a reasonable level of performance-- I would not call that "very slow." Regardless, I believe that whatever issue you're having is not VirtualGL's fault.

Thank you for confirmation and extremely enlightening pointers.

However, with TurboVNC+VirtualGL, the rendering is very fast except icons of libreoffice app is greyed out (http://picpaste.com/SKX3BlSH.png)!?

dcommander commented 8 years ago

I'll look into that issue.

nathankidd commented 8 years ago

However, 40 Mpixels/sec is still a reasonable level of performance-- I would not call that "very slow."

Don't forget that most X servers will not block VGL's PutImage on the WAN side; i.e. they'll happily spoil in the local fb layer, so 40 Mpixels/sec has no relationship whatsoever to "seen by user on desktop" pixels.

dcommander commented 8 years ago

My experience with xpra suggests that it does block on PutImage, because it's essentially acting as an X protocol compressor rather than a true X proxy. That's why I suggested benchmarking with vglrun -sp.

dcommander commented 8 years ago

I can't reproduce the greyed icons issue. Which GPU are you running on the server? Does the issue occur if you don't use vglrun? Which version of Java are you running on the client?

nathankidd commented 8 years ago

(My very limited xpra knowledge was mostly informed by https://www.xpra.org/trac/attachment/wiki/DataFlow/Xpra-Data-Flow.png which shows a big Xvfb bolted on, but that doesn't prove anything. /me stops hijacking the thread.)

zenny commented 8 years ago

Which GPU are you running on the server?

$ sudo lshw -c video
SCSI                      
  *-display        
       description: VGA compatible controller
       product: Caicos [Radeon HD 6450/7450/8450 / R5 230 OEM]
       vendor: Advanced Micro Devices, Inc. [AMD/ATI]
       physical id: 0
       bus info: pci@0000:01:00.0
       version: 00
       width: 64 bits
       clock: 33MHz
       capabilities: pm pciexpress msi vga_controller bus_master cap_list rom
       configuration: driver=radeon latency=0
       resources: irq:42 memory:d0000000-dfffffff memory:fddc0000-fdddffff ioport:ee00(size=256) memory:fdd00000-fdd1ffff

$ python /usr/lib/python2.7/dist-packages/xpra/client/gl/gl_check.py
PyOpenGL warning: missing accelerate module
PyOpenGL warning: missing array format handlers: numeric, vbo, vbooffset
OpenGL Version: 3.0 Mesa 10.1.3

OpenGL properties:
* GLU extensions           : GLU_EXT_nurbs_tessellator GLU_EXT_object_space_tess 
* GLU version              : 1.3
* display_mode             : ALPHA, SINGLE
* gdkgl.version            : 1.4
* gdkglext.version         : 1.2.0
* gtkglext.version         : 1.2.0
* has_alpha                : True
* max-viewport-dims        : (16384, 16384)
* opengl                   : 3.0
* pygdkglext.version       : 1.1.0
* pyopengl                 : 3.0.2
* renderer                 : Gallium 0.4 on AMD CAICOS
* rgba                     : True
* safe                     : True
* shading language version : 1.30
* texture-size-limit       : 16384
* transparency             : True
* vendor                   : X.Org
* zerocopy                 : False

Does the issue occur if you don't use vglrun?

Without vglrun, the icons are visible (as seen at http://picpaste.com/OYvxiPPC.png) , but x11grab with ffmpeg of the running Xvnc instance (:2) covers only 1/4 of the screen as seen at http://picpaste.com/BfHovd2Q.jpg .

Which version of Java are you running on the client?

$ java -version java version "1.7.0_91" OpenJDK Runtime Environment (IcedTea 2.6.3) (7u91-2.6.3-0ubuntu0.14.04.1) OpenJDK 64-Bit Server VM (build 24.91-b01, mixed mode)

zenny commented 8 years ago

Without vglrun, the icons are visible (as seen at http://picpaste.com/OYvxiPPC.png) , but x11grab with ffmpeg of the running Xvnc instance (:2) covers only 1/4 of the screen as seen at http://picpaste.com/BfHovd2Q.jpg .

This gets solved with a WM as you have advised at https://github.com/TurboVNC/turbovnc/issues/21#issuecomment-161469287

With VNC, however, the window management and all X rendering is taking place on the server, so it really needs a WM.

The rendering in client seems to be much swifter in TurboVNC than in Xpra without vglrun. I am yet to test with an application with vglrun that requires opengl.

dcommander commented 8 years ago

OK, so to summarize:

Greyed icons: likely an issue with VirtualGL and your 3D drivers. VirtualGL doesn't generally receive a lot of testing except with the nVidia and AMD proprietary drivers. Since its raison d'etre is to provide accelerated 3D functionality in environments where 3D acceleration would normally not be present, VirtualGL is generally not very useful with software OpenGL or with partially-accelerated OpenGL. If you want to run software OpenGL within TurboVNC, the best way to do it is to use Mesa compiled with Mesa's "X11 driver": http://www.turbovnc.org/Documentation/Mesa. But that should not be necessary when running 2D applications like LibreOffice.
Geometry issues: solved by using a window manager. As indicated, running a windowed application in TurboVNC without a WM produces unpredictable results. I have some ideas for incorporating a low-impact WM into TurboVNC, for the purpose of (optionally) running isolated applications instead of a full remote desktop, but that feature hasn't yet been funded or formally proposed.
Performance issues: likely xpra's fault, definitely not our fault.

Please open a new issue for any further items that you encounter, and also be sure to file issues specific to TurboVNC in that repository, not here. This thread has gotten too far into the weeds.

totaam commented 8 years ago

My experience with xpra suggests that it does block on PutImage, because it's essentially acting as an X protocol compressor rather than a true X proxy. That's why I suggested benchmarking with vglrun -sp.

That's not the case: xpra is just a regular X11 window manager, it cannot block such calls, it only gets notified by the X11 server after the event. (same as a VNC server)

I am in no way saying that the "lag" is not xpra's fault, just that the cause is very unlikely to be the OpenGL rendering with xpra - as this works very well. Could be the pixel capture, encoding, transport, etc..

VirtualGL / virtualgl

Virtualgl with xpra rendering very slow and lagging #11