TurboVNC / turbovnc

Main TurboVNC repository
https://TurboVNC.org
GNU General Public License v2.0
761 stars 138 forks source link

Frame spoiling problem with TurboVNC #301

Closed dimitris64bit closed 2 years ago

dimitris64bit commented 2 years ago

Hi all,

I would appreciate any help on the following issue:

I have launched TurboVNC server (2.2.4) on a linux machine. Then I access the server via TurboVNC viewer (2.2.4) from another linux machine. Then I run an OpenGL application with VirtualGL v2.6.5 64-bit (Build 20201117): vglrun +tr +v ./my_app.sh

The problem that I experience, is that after pressing a button (e.g. F1) that changes the OpenGL camera view, the key event seems to be processed by Qt, but the view is not updated.

Inside the code, I am calling the processEvents and just after that the flushX, but still the view is not updated.

The +tr log shows that the last calls are XCopyArea (multiple times) and finally the glXSwapBuffers.

Could you please shed some light of what might go wrong? Is it a TurboVNC bug? Is it a problem of the VNC buffer technology (copy only areas of massive pixel differences)?

Regards, d

dcommander commented 2 years ago

Xvnc servers like TurboVNC use a "deferred update timer" to coalesce changes made to the remote framebuffer by X11 applications. If a series of rapid-fire X11 drawing commands update the remote framebuffer, the Xvnc server tracks the regions affected by all of those drawing commands, then when the deferred update timer expires, the Xvnc server sends a framebuffer update containing all of the affected regions. (Note that a separate deferred update timer is maintained for each connected VNC viewer.)

Since VirtualGL uses the "spoil first" frame spoiling algorithm for rendered frames triggered by glXSwapBuffers(), it is guaranteed that the rendered frame triggered by the most recent glXSwapBuffers() command will be transported to the 2D X server (the TurboVNC session, in this case), and unless some other X11 drawing command overwrites the same region in rapid succession, the entire contents of the rendered frame in question should be delivered as an RFB framebuffer update.

I suspect that there is a bug, and I suspect a VirtualGL bug more so than a TurboVNC bug (but I could be wrong.)

Can you point me to a sample application that reproduces the issue?

dimitris64bit commented 2 years ago

Until to prepare a demo app, I would like to provide some more info:

After trying with some other apps e.g. https://learnopengl.com/Lighting/Multiple-lights or Blender it was not possible to reproduce the problem.

The difference is that our app is using Qt 4.8.6.

I tried to set the following variables (once per time):

VGL_SPOILLAST 0 VGL_READBACK pbo VGL_READBACK sync VGL_SPOIL 0 VGL_SUBSAMP gray VGL_SYNC 1

but it didn't make any difference.

Then I tried to comment out all calls to glFinish or glFlush. Still no difference.

Finally by removing QApplication::app()->flushX() things got even worse, with more updates to be missing this time.

Is there something else I can try, which can provide some meaningful outcome?

Thanks in advance

dcommander commented 2 years ago

Of those variables, VGL_SPOILLAST, VGL_SYNC and VGL_SPOIL are the only ones that might have made a difference. The only other variable that might be relevant is VGL_GLFLUSHTRIGGER.

dimitris64bit commented 2 years ago

VGL_GLFLUSHTRIGGER also didn't make any difference. We are preparing a Qt app, but the difficult is to make it replicate the problem. Is there anything we can understand from the +pr metrics about the behavior of timers? It seems that the draw is sent, but it takes a bit longer so the frame is not updated. If you click inside the window or hover on a button, the frame is updated with the missed frame.

dcommander commented 2 years ago

Unfortunately, until I can reproduce the problem, there is nothing I can do to help you diagnose it. I need to develop an understanding of exactly what is occurring.

dimitris64bit commented 2 years ago

We were able to reproduce the problem with a minimal app: vglExample.tar.gz Please check the README. The build folder has an already built version.

It seems that the change from: QApplication::setColorSpec(QApplication::NormalColor) to: QApplication::setColorSpec(QApplication::ManyColor) makes this problem to occur.

Looking forward to your reply.

dcommander commented 2 years ago

I can only reproduce the issue very sporadically, like once in every 100-200 tries, which makes it difficult to debug. Any advice on increasing the frequency?

dimitris64bit commented 2 years ago

We could add a timer to draw e.g. every few seconds, but I am not sure that this will make it more frequent. If you check the Qt 4.8 code: https://dreamswork.github.io/qt4/qcolormap__x11_8cpp_source.html image maybe any of these parameters are conflicting with the formats that VirtualGL can handle?

Also, you can add some debug messages whenever the frame is submitted or not. We can reproduce quite easily just with pressing the F buttons, e.g. Press F1 continuously e.g. for 20-30 times and then press another F key.

dcommander commented 2 years ago

We could add a timer to draw e.g. every few seconds, but I am not sure that this will make it more frequent. If you check the Qt 4.8 code: https://dreamswork.github.io/qt4/qcolormap__x11_8cpp_source.html image maybe any of these parameters are conflicting with the formats that VirtualGL can handle?

No. Unless you passed a non-default -depth argument to the TurboVNC Server, it will only support 24-bit and 32-bit visuals. Thus, the visual that the application ultimately chooses for its OpenGL window is 0x21, a perfectly normal 24-bit TrueColor visual. However, that begs the question why you are using QApplication::ManyColor rather than QApplication::NormalColor. To the best of my understanding, QApplication::ManyColor is meant to improve color allocation with 8-bit visuals, which no modern X servers support. (In fact, setting the color spec is obsoleted in Qt5.)

dcommander commented 2 years ago

The only advice I have at the moment is to try the 3.0 rc1 version of VirtualGL and the 3.0 evolving pre-release build of TurboVNC and see if the issue magically resolves itself. The issue is so hard for me to reproduce that I won't be able to determine, with any degree of confidence, whether it has been resolved. At this point, I have no idea whether it's a VirtualGL issue, a TurboVNC issue, or a Qt issue. The fact that you can work around it by ceasing to use non-deprecated Qt functionality suggests strongly that it may be a Qt issue.

dcommander commented 2 years ago

Closing for now. It is unclear whether this issue is in VirtualGL or TurboVNC, and given my inability to reproduce it reliably, I do not have the resources to pursue the matter further using the VirtualGL General Fund. If you want me to engage further on this, your company will need to pay for my labor and provide access to a machine on which the issue can readily be reproduced.