TurboVNC / turbovnc

Main TurboVNC repository
https://TurboVNC.org
GNU General Public License v2.0
771 stars 138 forks source link

Viewer: Re-investigate multi-threaded decoding #60

Open dcommander opened 8 years ago

dcommander commented 8 years ago

I spent numerous hours researching this topic in 2010, around the same time that I was developing multi-threaded encoding in the TurboVNC Server (under contract with RSV.) You can even see my commits related to this:

c557fd774c2b3f851856a3966592db1e8bf87be7 5df168073fe02158afe47cc28a20760b7535b276 778be336c43f1cacf41fee82af1877004a79396c 47580689febc95369a2550a914fa95fb7e5ed9ee

At the time, I found that it wasn't possible to increase the viewer's performance significantly by employing a tile-based round-robin approach such as is being used by the latest TigerVNC Viewer, but it's worth re-opening that topic. It's unclear whether their claims for improved performance are measured at the low level or whether they include the whole viewer. I personally found that it was more efficient to do what we're currently doing, which is to do all of the decoding in one thread and all of the blitting in another. My benchmark extensions to the TigerVNC Viewer should provide a more thorough picture, though. If there is some advantage to their approach, then it should be straightforward to adopt it in our Java viewer, at least.

dcommander commented 7 years ago

I ported the TurboVNC benchmark feature into both the TigerVNC 1.6.0 viewer code and the evolving TigerVNC 1.8 pre-beta viewer code (refer to https://github.com/dcommander/tigervnc/tree/benchmark and https://github.com/dcommander/tigervnc/tree/1.6-benchmark). Testing with that code produced mixed results.

On Linux (2011 Dell Precision T3500, quad-core 2.8 GHz Xeon W3530, nVidia Quadro K5000, CentOS 6.8), the breakdown in decoding time is as follows. Total time (decoding + blitting) is in parentheses.

TigerVNC 1.6.0 TigerVNC 1.8 pre-beta, 1 thread TigerVNC 1.8 pre-beta, 4 threads
Total 10.9 (22.4) 11.1 (23.2) 5.77 (17.4)
2D datasets 1.05 (5.77) 1.10 (6.36) 1.30 (6.65)
3D datasets 9.88 (16.6) 10.0 (16.8) 4.47 (10.8)

On Windows (2011 Dell Precision T3500, quad-core 2.8 GHz Xeon W3530, nVidia Quadro K5000, CentOS 6.8), the breakdown in decoding time is as follows. Total time (decoding + blitting) is in parentheses.

TigerVNC 1.6.0 TigerVNC 1.8 pre-beta, 1 thread TigerVNC 1.8 pre-beta, 4 threads
Total 11.1 (14.0) 11.6 (18.7) 5.06 (12.0)
2D datasets 1.06 (2.63) 1.05 (4.12) 1.45 (4.62)
3D datasets 10.1 (11.4) 10.6 (14.6) 3.62 (7.37)

On Mac (2015 Mini, dual-core 3 GHz Core i7, Intel Iris, OS X 10.10.5), the breakdown in decoding time is as follows. Total time (decoding + blitting) is in parentheses.

TigerVNC 1.6.0 TigerVNC 1.8 pre-beta, 1 thread TigerVNC 1.8 pre-beta, 2 threads
Total 11.7 (356) 8.58 (348) 6.85 (348)
2D datasets 1.90 (308) 1.04 (307) 2.35 (307)
3D datasets 9.81 (48.2) 7.54 (41.0) 4.50 (40.8)

Significant, albeit sublinear, speedup was achieved on all of the 3D datasets. The 2D datasets were a mixed bag:

I also added the aforementioned benchmark feature to the Java TigerVNC Viewer, since that viewer has a similar multithreaded decoding feature that we could potentially borrow (with some integration effort, since the Java TurboVNC Viewer forked from the Java TigerVNC Viewer five years ago.) Unfortunately, in the case of the Java TigerVNC Viewer, the overall decoding performance has regressed more than 3x relative to TigerVNC 1.6.0, even with a single thread, and enabling multithreading slows things down even further.

It appears that, at least algorithmically, there is some promise to TigerVNC's multithreaded decoding approach, but it currently seems to be too sensitive to the overhead of the underlying thread/locking implementation.

dcommander commented 6 years ago

Pushed to TurboVNC 2.3

dcommander commented 3 years ago

This won't make it into TurboVNC 3.0, unfortunately, due to lack of time and funding.

dcommander commented 3 years ago

New results from TigerVNC 1.11 are pretty similar to the results from TigerVNC 1.8 (multithreading is still a mixed bag): https://github.com/TurboVNC/turbovnc/issues/144#issuecomment-797832956