Open BlauerHunger opened 1 year ago
The next step would be figuring out how to get funding for the hundreds of hours of labor required to implement this feature, particularly given that no commercial VirtualGL users of which I am aware use the VGL Transport. (They all use TurboVNC or another X proxy, because the VGL Transport performs poorly over any networks except high-speed LANs.) Developing VGL, TurboVNC, and libjpeg-turbo has been my primary source of income since 2009, and I don't make very much money doing it. (I make about 1/3 to 1/4 of what I could make if I worked for a corporate employer.) The only way I can survive by developing free software is to aggressively pursue funded development, i.e. to follow the money. In the grand scheme of things, I am able to be much more responsive to community needs than I was when I developed VGL and TurboVNC for a large corporation (Sun Microsystems) with its own agenda, but there is a limit to how responsive I can be. Developing major features in my open source projects simply isn't possible without funding, and funding is generally driven by aggregate demand for a feature among large-scale VGL users.
All of that aside, VirtualGL has a transport plugin interface, so it would certainly be possible for the community to develop such a transport as you describe and implement it as a VirtualGL plugin. I can tell you from my experience that the odds of this feature happening any other way are probably very close to zero. The demand for implementing video codecs is largely centered around X proxies such as TurboVNC, and even then, there hasn't been enough demand for anyone to commit the hundreds of hours of funding necessary to make it happen.
I am observing that in the same local environment, VirtualGL on EGL or GLX backends show good framerates in real-life workloads under 300 fps (Unigine, Basemark GPU, etc.), but over 1000 fps (typically trivial benchmarks such as glxgears or glmark2), the framerate simply does not hold vs GLX via X.Org (decreases to 1/6 to 1/7). We are talking within the same machine (VirtualGL over EGL is a great method to hardware-accelerate containers without dangerous host privileges). Perhaps this is related to the VGL Transport bottleneck?
@ehfd I don't understand what you mean.
Assume that both pipelines work on the same machine; that is, no VGLTransport to different machines.
Pipeline A performs 3D OpenGL acceleration through the X.Org server for the NVIDIA GPU driver.
Pipeline B performs 3D OpenGL acceleration with VirtualGL through EGL using the NVIDIA EGL interface.
When both pipelines run Blender, Unigine Heaven, or Basemark GPU, A and B get similar framerates of around 200-300 fps. But when both pipelines run glxgears or glmark2, A obtains around 7000 fps, while B only gets around 1000 fps.
Vulkan performance is basically the same in both A and B.
Is the source of this bottleneck in VirtualGL for very high frame rates related to encoding? If not, what?
This is a question for my manuscript for our (one of the first) survey article on graphical containers, and this is an important enough question for me to put you in as a co-author with just identifying why.
@dcommander
VirtualGL has a certain amount of per-frame overhead, because it reads back every frame that the application renders. However, GLXgears-- or any benchmark that renders thousands of frames/second-- is utterly useless. Can you visually tell the difference between 1000 and 7000 fps? No, because that is well beyond the limits of human vision. Furthermore, with interactive applications, the mouse is typically not sampled more than 60 times/second, so you will never realistically achieve faster frame rates than that. It's much more interesting to simulate what users actually do with VirtualGL, such as render millions of polys to a 2K or even a 4K display. You can simulate that with GLXspheres, incidentally, if you use non-default options. If you are rendering at more typical interactive frame rates, then VGL's readback overhead is negligible.
However, GLXgears-- or any benchmark that renders thousands of frames/second-- is utterly useless. Can you visually tell the difference between 1000 and 7000 fps?
That was my thought as well. Any chance that might start to be an issue where high frame rate monitors (of around 3-400 fps) are starting to come up?
The monitor's refresh rate is not the same as the rate at which the GPU can render frames, and it never has been. (Bear in mind that I've been hip-deep in OpenGL and GPU architectures since the late 90s, when 300k polys/second was "fast" and the term "GPU" hadn't even been invented yet.) Again, the mouse isn't usually sampled faster than 60 Hz, so most interactive 3D applications won't render faster than that, regardless of the monitor's refresh rate. Anyhow, modern GPUs can usually read back at > 1 gigapixel/sec, which could theoretically drive a 2K monitor at 400 Hz. However, the main performance limiter in VGL is and always has been the image transport. If you are using VGL for remote display, as originally intended, then there is no way that any long-haul network could deliver rendered frames at > 1 gigapixel/sec, nor is there any way that a client could decompress frames at that rate (although libjpeg-turbo decompression performance is within a factor of 2-3 of that on modern CPUs, so it's conceivable.) Since readback occurs in parallel with the image transport, that means that readback will never be the bottleneck. (Refer to the VirtualGL User's Guide for further discussion.) Even if you are using VGL within the same physical machine (e.g. to transfer rendered frames from a VM guest to the host), then you are limited by the machine's ability to transfer the frames through memory, which is rarely as fast as GPU readback. So you will also be limited by the image transport in that case.
In other words, the benchmark may read 1000 fps, but I seriously doubt that 1000 frames/second are actually being transported. That's why we always recommend disabling frame spoiling when benchmarking. That way, the benchmark will reflect the number of frames/second that are actually transported, not including the frames that are read back and discarded.
Note also: yes, it is wasteful to read back frames that will ultimately be spoiled. There is a proposal seeking funding ("deferred readback") that would address that, but none of my commercial VGL customers have considered it enough of an issue to pay for the implementation of that feature.
@ehfd Note that this is really off-topic, as this issue was originally about supporting additional codecs in the VGL Transport.
I will email you for anything beyond. Thank you. @dcommander
When using the VGL transport (e.g. over ssh when combined with X11 Forwarding), I only have two options for video encoding: uncompressed or jpeg. While these choices are ok for most applications, uncompressed transfer of FullHD applications with very quick-changing contents blows up my 1 Gbit/s link that goes directly between server and client when I try to transfer more than 15fps, while the latency of the jpeg codec makes it unusable once I go above 30fps while at the same time loading the CPU on the server side to an unacceptable level. Waypipe mitigates that issue by supporting hardware accelerated h264 and vp9.
Please consider adding support for real video codecs like h264 and vp9. On modern hardware, there are hardware accelerated variants available (either through VAAPI or NVENC/NVDEC) that have extremely low latencies and don't load the CPU like the currently used jpeg codec does.
Because VGL uses a TCP connection (either directly or over ssh), lost frames over the network aren't an issue. When it comes to queue saturation on the client, there should be a mechanism implemented to notify the server so that it can skip some frames (the client can't skip them because skipping I-frames makes the result very ugly). Maybe this could even be extended to make some dynamic framerate control mechanism.