Open adammitchelldev opened 4 years ago
It appears that received frame data is never deleted by the client (see FrameData.cs
, which is a memory leak. The only way frame data needs to be accessed twice is when doing a DumpSyncReport()
, but it is not cleared when the sync report ring buffer is filled.
A similar issue occurs with sync hashes.
Will refactor this out by handing off the frame data to SyncReport to manage when it actually generates each frame, so it will clean it up when its buffer fills.
Mirroring the comment from this draft PR: https://github.com/OpenRA/OpenRA/pull/17886#issuecomment-607539588
I've update the network protocol slightly. Server will now ack orders (I was echoing them before as that was easier, but that change breaks singleplayer and is also unnecessary). It's a bit hacky, but not really any more hacky than what is there already and I want to do network refactoring in a separate PR later.
I reckon it might be a slight point of contention that I actually block a client from using its own orders until they are acked, but the logic is that every client receives orders from the same perspective, so if your ack was late, everyone else also received your packet late and had to stall anyway.
FrameData
) to 1.By making the algorithm jitter-aware, it can raise the client's "target" self-buffer threshold to cover standard jitter (we could measure or approximate a rolling average of the standard deviation of the jitter and set it proportionally, other approximations are possible.)
The algorithm can also be refined to be more aggressive when the self-buffer is dramatically above the desired latency (indicating a sudden fix in lag) to avoid the current slow recovery.
We could introduce catch-up when a client is recovering (when ack'ed frames from the self-buffer are above the jitter threshold and the client has orders from every other client) to avoid stalling other clients if possible when recovering from latency (it is better for one client to speed up than every other client to have to slow down).
(We could also use sync hashes as an additional measure of the latency of each other client, to build more of a picture of the world).
This is the "no holds barred" approach to eliminate practically all slowdown forever.
Adaptive order latency only works because buffering orders from multiple frames in a single frame is reliable (orders are strongly ordered). As long as each client applies the same orders for every player in the same order, it doesn't matter which frame the order was intended to be applied on. The logic change here would be that clients would send "id" order packets immediately (numbered like frame packets currently), and then the server would dictate which frame for clients to apply them on (batching the orders as current, or acking them).
This would require a "game loop" timer on the server to prevent a client from being able to over-saturate it with blank frames, causing it to believe that everybody else should drop orders for those frames (instead it would batch the eager frames). If it receives no orders from clients (clients would still send empty order frames), or something indicating a client failing to catch-up, it would then deliberately slow the simulation. However, the server would not need to run a simulation, keeping it to being a simple relay (the packet switching would be very lightweight).
This approach completely eliminates any problems with spikes causing slowdown, as all clients that still have good connections would receive valid orders. It also eliminates the need for a client to predict when to adapt, as they will never be responsible for slow-down on other clients, this means clients can always run at their best possible latency even in jittery situations (although some smoothing of catch-up may be desired to avoid the jitter from causing nauseating effects.)
As @pchote has mentioned, client-side world could also eventually have a prediction state that would match well with this.
I'm actually all in favour of option B, despite the fact it would mean essentially killing my adaptive order latency darling. Option A doesn't offer much technical benefit over option B, it is just also a lot better than the current static latency. Option B would also not preclude also having a lockstep or adaptive mode, but it offers the best improvement for games that use it. Client-side prediction is not a necessity in an RTS.
My analysis so far would also be that the sync-hash tickrate can (and should) be decoupled from net-frame tickrate. Net-frames just limit the application of orders to once every 3 frames, the de-sync could have occurred in any of the previous 2 frames and there would be no way to tell, so the usefulness of sync-reports is not necessarily degraded by allowing orders to be applied across the 3 frames.
Whilst the increase in resolution may not be necessary, I also don't think the current decreased resolution is either, and it is arbitrarily placing a minimum on latency, especially for lower game speeds. Any adjustable rate de-coupling we can do also encourages general de-coupling in the game loop and systems code (in this case, more decoupling between sync and network frames/orders is "probably" going to result in better code).
Doing a sync-hash for every net frame is also unnecessary, especially when sync reports are off. Sync hash is important to most players only for detecting a de-sync, which should be very rare, so it does not need to happen with as much "ferocity" as is useful with sync reports, less than once per second would still be plenty. Ideally we'd be doing order net frames at the simulation tick-rate and sync hash at some regular interval, but all of these multipliers being adjustable by the server seems best.
Motivation
Currently, playing OpenRA in large groups can be plagued with slowdown issues, particularly when some players have bad network conditions (e.g. Wi-Fi). We don't all have the best internet connection but we do all like to play!
There is already a mechanism in place to mitigate slowdown from lockstep due to latency, which I will call order latency but is called
FramesAhead
in the code: the client sends their orders to happen X frames ahead of where their current world is, so that it will arrive in time for all other clients and they will all have deterministic input to their respective simulations.However, this order latency is fixed by the server for all players in a config setting, despite the fact that the server simply acts as a dumb relay. If someone has a latency that is higher than the order latency mitigates, everyone in the game experiences slowdown.
I would like to extend this solution by introducing a mechanism and algorithm to adapt order latency to the current network conditions on-the-fly. This would prevent players from needing to adjust their order latency manually in the first place, prevent poor networking conditions for one player ruining a game for many people and dynamically adapt to changing conditions, ensuring the game always runs as smoothly as it can.
Proposed Solution (with alternatives)
In theory, each client can safely have their own order latency that mitigates their connection latency to the server. The ideal situation is as follows:
Dynamic Order Latency
The first problem is enabling the ability for a client to change their order latency during play. Any solution will involve implementing this, but it is fortunately pretty straightforward:
Adaptive Order Latency
There are two possible approaches to order latency:
Ping Matching + Simulation Catchup Treat simulation sync as separate from order latency: change the order latency to best match the client-server latency (adapting to the round-trip), and then adjust your simulation speed to match that of the incoming orders of the other clients. This is a pure approach where the server has an ideal "all orders at the same time" situation.
Advantages:
Disadvantages:
Simple Feedback Algorithm Adjust order latency in a basic way based on eccessive or insufficient orders coming in from other players. One simple algorithm would be to increase latency when simulation steps are missed, but decrease when a majority of other players have sent you more frame data than required (and possibly speeding up when all of them have). The rules could be more complex (introducing randomness or round-robin to the speed-up to prevent clients from causing a "race to reduce latency" that might ultimately cause more harm than good).
Advantages:
FrameData
andOrderManager
). No protocol changesDisadvantages:
Side Effects
Adaptive order latency should be side-effect free, if it works as designed. The current netcode implementation only relies on the clients sending all of the orders once for each frame; the implementation in client and server does not depend on knowing what each individual client's order latency is, a client could theoretically send no orders for the next hour's worth of frames and it would be happy. So long as frames are not missed or sent twice, the behaviour is reliable.
The biggest potential issues:
Conclusion
I would propose that solution 1 is generally the better solution, since number 2 will likely be more difficult to debug, but comes with some implementation drawbacks. I'd like to discuss it a bit but I'm already fairly prepared to start coding it.
I may also be inclined to do a bit of refactoring, the netcode is not abstracted in a typical fashion.