FFR (mostly) working - Githubissues

zmerp commented 5 years ago

Hello, I finally implemented my FFR algorithm on ALVR, mostly. Here's my explanation of the algorithm (with link to an online demo), and here's the code for the server and the client. I'm not opening a pull request because I did not implement the UI and network interop for the ffr variables, instead they are hardcoded here and here.

The two new free variables are foveationStrengthMean and foveationShapeRatio. foveationStrengthMean is a factor that controls the linear "size" (or square root of the area) of the foveated region. In reality, for how I built the algorithm there is no boundary to the foveation area, the compression is progressively higher the further you look from the center of the screen, so foveationStrengthMean should be interpreted only as a multiplicative term. The higher the value the stronger the foveation effect, the smaller the encode/decode latency. A 5 correspond rougly to Quest's foveation level HIGH. foveationShapeRatio controls the shape of the foveation (ratio between horizontal and vertical dimensions). It should match the visual acuity graph of the human eye for a for tracked foveated rendering and a bit larger for fixed foveated rendering, because statistically the eye moves more horizontally than vertically. A value between 1.5 and 2 is ok. The center of the foveation is fixed to the center of the each screen, which is calculated using the field of view reported by oculus vrapi.

A big chunk of the code I wrote (the filtering part) is still not functioning. I tried to fix it but I'm out of time. I thought I could still share what I achieved so far, expecially since the benefit of the filtering would be relatively small (I predict a increase of foveationStrengthMean from 5 to 6 given the same visual quality).

Test:

Oculus Quest, codec h265, bitrate 30Mbps, video resolution 100%(2880x1600), buffer size 200kB.

FFR off (foveationStrengthMean = 0): encode latency: ~7ms decode latency: ~11.9ms

FFR on (foveationStrengthMean = 5): encode latency: ~4ms decode latency: ~7.2ms

total save: 7.7ms

JackD83 commented 5 years ago

Great to see that you had the time to finish it! I just went ahead and merged your changes and implemented the missing parameter in the UI and the network transmission.

Unfortunately, I get very bad results with FFR :( The image on the HDM is very pixelated and I have black/white rectangle at the center that is causing distortions.

I checked the encoded video and it looks fine:

FFR

I tried to make a screenshot on the quest to demonstrate the bad quality, but you can't see it on the screenshot?! Its very strange

zmerp commented 5 years ago

Thank you! Maybe the black rectangle is caused by the broken filtering code, I'll remove it. Mind that in my last commit I left foveationStrengthMean = 10 which is way too high. If you want to record the quest screen you can use scrcpy. These are the screenshot relative to the test I made: foveationStrengthMean = 0: foveation = 0 foveationStrengthMean = 5: foveation = 5

zmerp commented 5 years ago

I managed to reproduce and fix the solid color rectangle bug. I got it too when I lowered foveationStrengthMean to 0.1 and it was caused by a floating precision error. If you still have some visual glitch after my last commit try adding highp in other places in the shader.

EDIT: Never mind, putting on the headset I still saw on the right eye the pixelation you were referring to.

JackD83 commented 5 years ago

Thank you for your help! I noticed the 10 right away. The gui is already working to change it with a simple server restart. I put it in the FRR branch.

The rectangle is gone with your changes.

The pixelation is still a problem. It can not be captured with scrcpy either. I think its a problem with the display and the render target resolution.

The same problem occurs if you set the video resolution to 75% and use h265 as video code.

zmerp commented 5 years ago

It seems I fixed the pixelation problem setting highp float as default.

JackD83 commented 5 years ago

I can confirm that this fixed the issue 👍

Do you think that FFR the way it works know should be released? I'm under the impression that even with StrengthMean 0, the image quality is still better that before.

I don't know for sure, but is it possible that at 0, the areas that are not visible are still masked out and not encoded in the video stream? I think the original version encoded a rectangular video with all areas included.

zmerp commented 5 years ago

I added a switch on both server and client that when foveationShapeRatio = 0 it reverts to the old rendering logic and FFR is not even initialized, so I think there is no harm in adding my code to the release. The server starts with a rectangular image, applies a deformation similar to the one of lens distortion correction, then the client undistorts it back to a rectangular image (and then Oculus runtime distorts it again). Rather, I noticed another problem, that is when FFR is on, the image does not look as sharp as when it is off, at least at 100% resolution. I think that is because FFR distortion forces to do a resampling which is inevitably not pixel-perfect. That can be mitigated by increasing the frame size but that sort of defeat the purpose of FFR.

Sztrovacsek81 commented 5 years ago

Hello Guys, You did excellent work with ALVR. Just a quick report: I have tested it yesterday with different FFR settings but unfortunatelly it is also blurry at the center of the screen. When I turn off FFR the image is crisp and clear again.

zmerp commented 5 years ago

As I explained, there is a theoretical limitation to my approach. I could do some more fancy math to make the center of the screen rendered pixel perfect but I think this would not remove completely the problem of blurriness. A safer bet would be to implement a FFR that cuts the frames into rectangles, renders them to different resolutions and then stitches them back together on client side. Maybe I'll implement this in the near future.

wingzx3 commented 5 years ago

Would something like this work? Temporal Resolution Multiplexing. Could also be combined with FFR. Here's the research paper on it.
https://www.cl.cam.ac.uk/research/rainbow/projects/trm/

zmerp commented 5 years ago

Great find! I've read it and it looks promising, however I see there are a few nuances to keep in mind and a few problems to solve if you were to implement this:

It requires the tuning of a display specific parameter (different for every headset), and also a parameter for "sensitivity to motion". This last one varies from person to person, but also from the perceived brightness of the scene.
The algorithm, although not too complex, requires expensive operations, especially on the client side, and the Quest does not like particularly complex shaders (example: gpus natively support up/down sampling as part of the rendering pipeline but this algorithm requires custom more precise calculations to avoid "phase distortions").
The codecs H264/HEVC use space/temporal compression techniques and rely on the video stream to change as little as possible in consecutive frames. The paper shows that for even frame indices the left eye frame is rendered at full resolution and the right eye frame at half/quarter resolution, and vice-versa for odd frame indices. To make this codec-friendly you can swap left and right images for odd frame indices in the input buffer of the encoder, leveraging the similarity between the left and right eye views, but depending on the scene the encoder could generate quite big P frames (and thus more data to transmit that leads to more latency). Another option is to run alternately two separate instances of encoder-decoder for odd/even frames; this will cause the P frames to be (in the worst case) twice as large (because each encoder runs effectively at half the framerate and there is more opportunity for the scene to change from one frame to another). A third option is to completely disable P frames (and so space/temporal compression) that will lead to higher transmission latency, but hopefully also a lower encoding/decoding latency.
The last problem, and potentially deal breaker, is that the Quest runs at 72 fps and the paper states that the algorithm requires 90 fps video stream. It's not a matter of how much compression you can achieve, it's a matter of how sensitive to flicker is your eye. While you still perceive motion at 72 fps, your flicker perceptions is of 36 fps and that is far too low.

JackD83 commented 5 years ago

I brought the paper up in the original ALVR issue tracker. I had the pleasure to try it at the IEEEVR conference this year. Looked very good on the Oculus cv1 and with their own renderer.

I concur with @zarik5 that the 72hz of the Quest are the main problem here. As I tried the implementation, some scenes already had a hint of flicker to them. Reducing the fps will most likely make them more noticeable

JackD83 commented 5 years ago

I was just testing some settings and running ALVR at 125% with ffr 4. Looks very good and is very playable. Looking at the encoded video I was wondering if we could mask out the areas that are not visible. alvr

I assume that the areas that are not a solid color still require some portion of the available bitrate to encode

zmerp commented 5 years ago

I'm having a hard time making the new FFR work. @JackD83 can you tell me what program you use to view the stream produced by ALVR server?

EDIT: I managed to view the stream with vlc forcing h264 demuxer

JackD83 commented 5 years ago

I use ffplay to play the stream ffplay

JackD83 commented 5 years ago

@zarik5 Great to see that oculus chose the same approach to ffr than you to make the streaming work calling it axis-aligned distorted transfer (AADT). Any chance that we can use the sliced image encoding and decoding? I imagine that the decoding part on the quest could be the problem here without direct access to the decoder

zmerp commented 5 years ago

Could it be that they got the inspiration from ALVR? 😉

Any chance that we can use the sliced image encoding and decoding?

I don't understand, isn't the new FFR algorithm that I implemented exactly what you want? Do you have any official link to an explanation of how AADT works?

JackD83 commented 5 years ago

I was referring to this video. The first is the preservation of visual quality where they use AADT that works like your FFR warp implementation. The second part is the sliced image encoding and decoding, where they cut the video in (presumably) some kind of stripes, encode the stripes and send them to the headset while the rest of the frame is still encoding. That should save a lot of time and improve latency

zmerp commented 5 years ago

The distortion part look easy to implement. The visual quality/artifacts should be the same of the my FFR with slices implementation but it should cut away a bit of complexity in the shaders. The sliced encoding/decoding part can be implemented by creating as many instances of encoder/decoder as the number of slices.

JackD83 commented 5 years ago

Do you you think implementing all of this is something that should be done? I'm not sure if all this effort is for nothing if oculus announces a wireless solution next year.

I merge all your other changes with the sliced ffr and the offset and it works very good. If you don't look for the cut, its not noticeable at all and the image is very sharp!

I'm using a strength of 2.5 and an offset of 0.04

zmerp commented 5 years ago

To implement AADT you would need to rewrite the rendering, encoding, decoding, networking and timing logic, basically rewrite ALVR from scratch. I honestly don't feel like making this commitment as I rarely have spare time, expecially now that summer is over.

I also tried in past couple of days to make a demo to test the Lanczos filter but my implementation in Shadertoy keeps crashing webgl half of the time and when it doesn't there are glitches of which I have no control. So I think I'm done writing new stuff, at lest for a while.

Sztrovacsek81 commented 5 years ago

Yes! You did it! The sliced version is working flawlessly out of the box (strenght = 2, vertical offset = 0)! No blurry image in center of the screen, almost unnoticable on the edges (you must force your eyes to see the corners) and have got -10ms latency (H.264 + 100Mbps + 100%)! Thank you for this awesome work!

Edit: also tried strenght = 5 - much worse picture on the edges but almost no more latency gain. So 2 is a great default value (haven't tried vertical offset yet).

alvr-org / ALVR

FFR (mostly) working #12

Test: