Optimizing GL_NV_gpu_multicast VR SLI operation, issue with Motion Smoothing

techtonic65 commented 5 years ago

I'm testing this on an HTC Vive Pro.

For improving performance on NVIDIA SLI configurations I have added support to my app for the GL_NV_gpu_multicast OpenGL extension. So now I have each eye being rendered in parallel, one on each GPU. This gives a good boost in performance.

My app is very performance intensive so I'm trying to get every bit of performance out of it that I can.

But before calling VRCompositor::Submit() for the right eye, the texture for the right eye has to be copied from GPU 1 to GPU 0 using glMulticastCopyImageSubDataNV(). This must complete before Submit() can be called, so there is not much that can be done to prevent this copy from consuming precious frame time.

Also, after calling Submit() for each eye, the OpenVR driver is going to sequentially perform the lens distortion pass on each eye.

I figured I could improve the performance of both these steps by performing the distortion pass for each eye in parallel in my code and submitting the pre-distorted texture using the Submit_LensDistortionAlreadyApplied flag when calling Submit().

I get the distortion mesh using calls to VRSystem::ComputeDistortion(). I create a single distortion texture but set it to allow for different data on each GPU by calling glTexParameter with GL_PER_GPU_STORAGE_NV. I then copy the left eye mesh into the texture on GPU 0 and the right eye mesh onto the same texture in GPU1. Now I just render a single full screen quad to perform the distortion using my shader and it gets done in parallel on both GPUs.

Due to the nature of the distortion, the resulting distorted texture can be significantly smaller than the original render buffer without affecting image quality (I'm currently using a factor of 1.4 in each dimension, meaning the resulting texture is almost half the size in terms of memory (1.4 x 1.4 = 1.96). This means only half as much data has to be copied from GPU 1 to GPU 0.

With the parallel render and the reduced texture size I've shaved around 2ms from the per frame overhead when using 2016x2240 render buffer (which is the recommended 100% size for Vive Pro). For reference, this is using two Quadro P4000 GPU's. The reduction may be less if using faster GPU's.

This works well as long as I don't enable Motion Smoothing. I'd really like to be able to enable Motion Smoothing since even with all these optimizations I'm not always making 90fps. But here's the problem, If I enable Motion Smoothing then I get these nasty flickering slivers right on the inside edge of each eye. It looks like every other frame (probably the ones filled in by the motion smoothing) this sliver is not being updated correctly and looks like it is just black.

Question 1: Is it possible to fix the Motion Smoothing so that this black flashing sliver is not generated when using pre-distorted textures?

Question 2: (fixed, see next comment for solution) I'm currently using a hard-coded scale factor of 1.4 for the texture size reduction that can be applied without causing loss in quality. But this value is going to change from one HMD to the next as it depends on the degree of distortion. Is there a programmatic way to evaluate this ratio? If not, can one be added to the API?

techtonic65 commented 5 years ago

OK I think I figured out the answer to my second question.

If anyone is interested - to avoid a reduction in quality I need to ensure a 1:1 mapping of pixels in the center of the screen. So for the horizontal scale, I just need to take the two horizontally adjacent samples in the center of my distortion texture and compute the ratio of the input U delta to the output U delta. The vertical scale factor can be done similarly using the two vertically adjacent samples and compute the ratio of the input V delta to the output V delta.

techtonic65 commented 5 years ago

Continuing on Question 2, I've now implemented the method I described in my last reply. Anticipating that some HMD's may not have the 1:1 mapping point exactly at the center of the screen I ended up scanning the whole central quadrant of the screen looking for the smallest ratio in U and V.

For HTC Vive Pro the optimum scale factor ended up just shy of 1.6 in U and V directions. This results in a 2.5x reduction (1.6 x 1.6) in the amount of data that has to be copied from GPU 1 to GPU 0.

The downside I have discovered is that, of course, the asynchronous reprojection has to be applied before lens distortion is applied. So supplying pre-distorted images to OpenVR results incorrect results of asynchronous reprojection if it kicks in, which generates some double imaging as you rotate your head.

Motion Smoothing seems to work better on the pre-distorted images. There is some slight swimming as you rotate your head, but no real double imaging. Unfortunately, there is the flashing sliver issue as reported in the original post.

If I maintain 90fps with motion smoothing disabled the predistortion works very well and cuts 2 ms from my frame time, which helps substantially to maintaining 90fps.

ValveSoftware / openvr

Optimizing GL_NV_gpu_multicast VR SLI operation, issue with Motion Smoothing #1055