games-on-whales / wolf

Stream virtual desktops and games running in Docker
https://games-on-whales.github.io/wolf/stable/
MIT License
292 stars 20 forks source link

Implement Reference Frame Invalidation (RFI) #5

Open ABeltramo opened 1 year ago

ABeltramo commented 1 year ago

Moonlight recently added support for it, we should add it as well.

CGutman on Discord

Video codecs make heavy use of previous frames as references to reduce the size of new frames. However, that means a frame can only be successfully decoded if prior frames were also successfully decoded. That means we have a problem if a frame gets dropped by the network, because a future frame could try to reference it.

There are 2 main ways of dealing with lost frames. The way we've dealt with this in the past is to just request a key frame (which contains no references to prior frames) to get the decoder back to a known state. That is the way most applications handle this situation (or they depend on periodic key frames to recover from any lost frames). The big downside is that key frames are huge and usually end up a much lower quality because they hit bitrate limits. You can sometimes see the blocky artifacts for a couple frames after a key frame while the encoder sends additional detail in following frames.

However, there is a more clever way of handling this that NVENC also supports. You can actually tell the host which exact frames you lost (reference frame invalidation or RFI) and it can avoid referencing those lost frames in newly encoded frames. That means recovering from a loss doesn't require re-encoding the entire screen, just the portion that was lost (and still relevant to what's currently on screen). This means much smaller recovery frames and smaller frames have better image quality (since they aren't hitting the bitrate ceiling) and are less prone to packet loss or network congestion.

RFI also allows us to do some other clever tricks. In the recent updates for Android and iOS, I implemented a feature that I called "speculative RFI". In the past, we would only know for sure that we lost frame N when we received the first packet in frame N+1. That means that we were already dropping a minimum of 2 frames (N which was dropped by the network and N+1 which has to be dropped because we haven't received an RFI recovery frame yet). There are also some cases where it could be much longer than 2 frames to recover, particularly if the network is unstable.

However, If we aren't seeing out of order packets from the host, we can predict if the current frame is likely to be successfully received based on the packets we've received so far. When we receive frame N, we are keeping track of what the latest sequence number of packet we've received is. If we see packets 1, 2, and 8, we can be pretty sure that packets 3-7 have been dropped (since we've never seen out of order packets from the host). That allows Moonlight to do clever things like predict whether it will be able to recover the frame based on what it knows about the number of packets remaining to be received. If we predict that we won't be able to reconstruct the frame because too many packets have been dropped, we can immediately send an RFI request to the host. That can allow us to potentially recover in a single frame.