Adaptive Order Latency to transparently mitigate slowdown

adammitchelldev commented 4 years ago

Motivation

Currently, playing OpenRA in large groups can be plagued with slowdown issues, particularly when some players have bad network conditions (e.g. Wi-Fi). We don't all have the best internet connection but we do all like to play!

There is already a mechanism in place to mitigate slowdown from lockstep due to latency, which I will call order latency but is called FramesAhead in the code: the client sends their orders to happen X frames ahead of where their current world is, so that it will arrive in time for all other clients and they will all have deterministic input to their respective simulations.

However, this order latency is fixed by the server for all players in a config setting, despite the fact that the server simply acts as a dumb relay. If someone has a latency that is higher than the order latency mitigates, everyone in the game experiences slowdown.

I would like to extend this solution by introducing a mechanism and algorithm to adapt order latency to the current network conditions on-the-fly. This would prevent players from needing to adjust their order latency manually in the first place, prevent poor networking conditions for one player ruining a game for many people and dynamically adapt to changing conditions, ensuring the game always runs as smoothly as it can.

Proposed Solution (with alternatives)

In theory, each client can safely have their own order latency that mitigates their connection latency to the server. The ideal situation is as follows:

The server receives all orders for a frame simultaneously, immediately rerouting to all clients.
Each client receives every order after one network trip of latency, and simulates a world tick. The world ticks happen at a slightly delayed time for each client (T - 1x).
The clients then send orders back to the server for a frame in the future (current frame + order latency). The order latency (in time) should be just over their round-trip latency, so the order is for the frame (T + 1x).
The orders for a frame then arrive at the server all at T (for the given frame), repeating the cycle.

Dynamic Order Latency

The first problem is enabling the ability for a client to change their order latency during play. Any solution will involve implementing this, but it is fortunately pretty straightforward:

Increasing the order latency by X: Send X blank order frames.
Decreasing the order latency by X: Buffer orders for X frames and combine them into one order frame (combining reliably may be the hardest part here, not looked too hard into implementation for order collecting yet, naively we could just re-apply orders in order?)

Adaptive Order Latency

There are two possible approaches to order latency:

Ping Matching + Simulation Catchup Treat simulation sync as separate from order latency: change the order latency to best match the client-server latency (adapting to the round-trip), and then adjust your simulation speed to match that of the incoming orders of the other clients. This is a pure approach where the server has an ideal "all orders at the same time" situation.

Advantages:
- Quick to adapt to network condition changes, possible for the algorithm to tune for cases where jitter is high.
- Possible to introduce catch-up for a player when their network conditions improve, as this can be correctly differentiated from other players improving.
- Can introduce server-side enforcement by penalising and kicking clients who do not follow the latency rules (although this would require frame packet inspection, which we don't currently do at all if not necessary).
- Generally should be very stable and only slow down when absolutely necessary.
- Have a latency measure to report to players in-game!
Disadvantages:
- This is the more advanced method, and would require fairly significant additional implementation on top of what is there currently, The server would need to send additional information to the client to allow it to calculate or adjust its own round-trip (simple method is to just relay "acks" for frame data).
- The client logic could potentially become complex when trying to tune for unstable network conditions.
- This is a very client-server centric solution, as it relies on the client only having to manage its latency for its own round-trip to the server, leaving other clients to manage theirs.
Simple Feedback Algorithm Adjust order latency in a basic way based on eccessive or insufficient orders coming in from other players. One simple algorithm would be to increase latency when simulation steps are missed, but decrease when a majority of other players have sent you more frame data than required (and possibly speeding up when all of them have). The rules could be more complex (introducing randomness or round-robin to the speed-up to prevent clients from causing a "race to reduce latency" that might ultimately cause more harm than good).

Advantages:
- Very simple to implement, requiring no changes to the server or most of the client (only FrameData and OrderManager). No protocol changes
- Latency would be distributed in a P2P fashion, which could be seen as fairer by some. When you have a 2 player game, the latency balances until both players have equal order latency, rather than the player with the lower ping to the server always having the lowest order latency. Locally hosted games would also be fairer. (The same idea could be implemented manually with more effort in solution 1)
- Useable in a purely P2P environment where latency to a central point is not fixed or known.
Disadvantages:
- The way "catch-up" is achieved might be by slowing down other clients. If jitter is high for one client or the algorithm was generally too greedy, we could see situations where there is more total slowdown than necessary. This could be several factors higher than solution 1 in edge cases (which would also be harder to detect).
- Tuning would likely be harder.

Side Effects

Adaptive order latency should be side-effect free, if it works as designed. The current netcode implementation only relies on the clients sending all of the orders once for each frame; the implementation in client and server does not depend on knowing what each individual client's order latency is, a client could theoretically send no orders for the next hour's worth of frames and it would be happy. So long as frames are not missed or sent twice, the behaviour is reliable.

The biggest potential issues:

Dropping inputs when collecting multiple frames worth of orders.
Bad implementation may cause more slowdown than intended by being over-aggressive in minimising order latency.
More netcode to debug.
May introduce protocol changes.

Conclusion

I would propose that solution 1 is generally the better solution, since number 2 will likely be more difficult to debug, but comes with some implementation drawbacks. I'd like to discuss it a bit but I'm already fairly prepared to start coding it.

I may also be inclined to do a bit of refactoring, the netcode is not abstracted in a typical fashion.

adammitchelldev commented 4 years ago

It appears that received frame data is never deleted by the client (see FrameData.cs, which is a memory leak. The only way frame data needs to be accessed twice is when doing a DumpSyncReport(), but it is not cleared when the sync report ring buffer is filled.

A similar issue occurs with sync hashes.

Will refactor this out by handing off the frame data to SyncReport to manage when it actually generates each frame, so it will clean it up when its buffer fills.

adammitchelldev commented 4 years ago

Mirroring the comment from this draft PR: https://github.com/OpenRA/OpenRA/pull/17886#issuecomment-607539588

I've update the network protocol slightly. Server will now ack orders (I was echoing them before as that was easier, but that change breaks singleplayer and is also unnecessary). It's a bit hacky, but not really any more hacky than what is there already and I want to do network refactoring in a separate PR later.

I reckon it might be a slight point of contention that I actually block a client from using its own orders until they are acked, but the logic is that every client receives orders from the same perspective, so if your ack was late, everyone else also received your packet late and had to stall anyway.

Current algorithm:

Clients attempt to gradually decrease their "self-buffer" (the number of ack'ed frames they have in the main buffer FrameData) to 1.
They decrease their order latency by 1 frame at a time, and wait for orderLatency + 1 frames before attempting to decrease again (their self-buffer size is inherently delayed by a round-trip) as a safe way to avoid "bouncing" (decreasing too far and arriving at 0, causing stalls).
When a client is the cause of stall (it hasn't received an ack in time), it increases order latency by up to Y time (currently 5) until the next successful frame. This allows ping spikes to stall without spiralling out to ridiculous latencies (that take ages to come down from).

Current issues:

Clients with high jitter (higher than a single frame) cause regular brief stalls as they hit the peaks of their jitter. This is because the current algorithm always eventually reaches a self-buffer of 1 which causes inevitable stalls when their latency spikes.
Clients that have ended up with high latencies take a long time to come down.
Clients coming down from high latency do not attempt catchup and cause everyone else to stutter when they are doing so. It's noticeable but it's 100x better than constant stutter and it only happens when a client's latency shifts dramatically.
Stalls are still inevitable in lag-spikes :( and a sudden large increase in ping can stall the game a few times before the client responsible commits to a large enough order latency (it has no way of knowing what to do reliably because any feedback it can get has to come after a round-trip of latency!)
Behaviour when clients are lagging not due to latency is not tested. In theory, they should not attempt to increase order latency because they did not miss any acks.

Proposed solutions:

A: Refine adaptive latency algorithm

By making the algorithm jitter-aware, it can raise the client's "target" self-buffer threshold to cover standard jitter (we could measure or approximate a rolling average of the standard deviation of the jitter and set it proportionally, other approximations are possible.)

The algorithm can also be refined to be more aggressive when the self-buffer is dramatically above the desired latency (indicating a sudden fix in lag) to avoid the current slow recovery.

We could introduce catch-up when a client is recovering (when ack'ed frames from the self-buffer are above the jitter threshold and the client has orders from every other client) to avoid stalling other clients if possible when recovering from latency (it is better for one client to speed up than every other client to have to slow down).

(We could also use sync hashes as an additional measure of the latency of each other client, to build more of a picture of the world).

B: Introduce server-authoritative order control

This is the "no holds barred" approach to eliminate practically all slowdown forever.

Adaptive order latency only works because buffering orders from multiple frames in a single frame is reliable (orders are strongly ordered). As long as each client applies the same orders for every player in the same order, it doesn't matter which frame the order was intended to be applied on. The logic change here would be that clients would send "id" order packets immediately (numbered like frame packets currently), and then the server would dictate which frame for clients to apply them on (batching the orders as current, or acking them).

This would require a "game loop" timer on the server to prevent a client from being able to over-saturate it with blank frames, causing it to believe that everybody else should drop orders for those frames (instead it would batch the eager frames). If it receives no orders from clients (clients would still send empty order frames), or something indicating a client failing to catch-up, it would then deliberately slow the simulation. However, the server would not need to run a simulation, keeping it to being a simple relay (the packet switching would be very lightweight).

This approach completely eliminates any problems with spikes causing slowdown, as all clients that still have good connections would receive valid orders. It also eliminates the need for a client to predict when to adapt, as they will never be responsible for slow-down on other clients, this means clients can always run at their best possible latency even in jittery situations (although some smoothing of catch-up may be desired to avoid the jitter from causing nauseating effects.)

As @pchote has mentioned, client-side world could also eventually have a prediction state that would match well with this.

Conclusion

I'm actually all in favour of option B, despite the fact it would mean essentially killing my adaptive order latency darling. Option A doesn't offer much technical benefit over option B, it is just also a lot better than the current static latency. Option B would also not preclude also having a lockstep or adaptive mode, but it offers the best improvement for games that use it. Client-side prediction is not a necessity in an RTS.

adammitchelldev commented 4 years ago

Sideways to the topic, more on general network refactoring

My analysis so far would also be that the sync-hash tickrate can (and should) be decoupled from net-frame tickrate. Net-frames just limit the application of orders to once every 3 frames, the de-sync could have occurred in any of the previous 2 frames and there would be no way to tell, so the usefulness of sync-reports is not necessarily degraded by allowing orders to be applied across the 3 frames.

Whilst the increase in resolution may not be necessary, I also don't think the current decreased resolution is either, and it is arbitrarily placing a minimum on latency, especially for lower game speeds. Any adjustable rate de-coupling we can do also encourages general de-coupling in the game loop and systems code (in this case, more decoupling between sync and network frames/orders is "probably" going to result in better code).

Doing a sync-hash for every net frame is also unnecessary, especially when sync reports are off. Sync hash is important to most players only for detecting a de-sync, which should be very rare, so it does not need to happen with as much "ferocity" as is useful with sync reports, less than once per second would still be plenty. Ideally we'd be doing order net frames at the simulation tick-rate and sync hash at some regular interval, but all of these multipliers being adjustable by the server seems best.

OpenRA / OpenRA

Adaptive Order Latency to transparently mitigate slowdown #17882

Motivation

Proposed Solution (with alternatives)

Dynamic Order Latency

Adaptive Order Latency

Side Effects

Conclusion

Current algorithm:

Current issues:

Proposed solutions:

A: Refine adaptive latency algorithm

B: Introduce server-authoritative order control

Conclusion

Sideways to the topic, more on general network refactoring