Kromster80 / kam_remake

"KaM Remake" is an RTS game remake written in Delphi from scratch.
http://www.kamremake.com
GNU Affero General Public License v3.0
361 stars 90 forks source link

Network optimizations #194

Open reyandme opened 7 years ago

reyandme commented 7 years ago

Currently max number of players+spectators is 10. When we tried to add more spectators/players it was unplayable because of lags/all time disconnections etc.

Probable network problems: 1) Spectators are also generating traffic for all players, so adding 1 spectator means it will send gic commands every tick to every player even if he just spectating and can not do anything on the map. AFAIK these empty commands packets needs to understand if spectator was disconnected or not.

If it is true, then we need to remake this mechanism, so server itself could determine who is disconnected by some other special packets (only transfered to server, not to all players), then notify players if someone disconnected. Or may be by some other mechanism (@lewinjh @Kromster80 your suggestions ?) And do not send mk_Command to all players then.

So adding 1 spectator will increase number of packets linearly: +1 spec = +N commands packets to him, +2 spec = +2N packet to specs etc etc.

2) Server just transfer commands, but could do some optimizations.

For example server can accumulate packets for every 100ms from all source-players and then transfer them to destination-player.

Kromster80 commented 7 years ago

I also encourage to playtest actual number of packets that gets sent (in case we overlooked something).

LauraRozier commented 7 years ago

Got a small Wireshark PCap here with this filter: tcp.port==56789 and ip.dst==192.168.20.101 DropBox link

Not that many tbh

EDIT : The PCap is from me connecting to a server, starting a game with one AI, placing a school and tavern, then waiting one minute-ish

Kromster80 commented 7 years ago

This capture is meaningless - iirc AI does not send commands through GIP, and 2 players is nothing for Net. Please try something along the lines of 8 players + 2 spectators from different PCs

Also please see if you can "condense" captured results into some easy-to-view form, alike "NNNN packets per minute", instead of a binary file that can be opened with some specific tool.

LauraRozier commented 7 years ago

Sounds like fun. Will see what I can give. With or without the lobby communication?

EDIT : Would you prefer the current release or the most recent commit?

reyandme commented 7 years ago

Lobby communication is not so important here.

Btw what does these gray and red lines mean ? image

LauraRozier commented 7 years ago

The colors have to do with the flags set in the packets. Red = RST Grey = SYN or FIN ( There are more per color, but these fit this example )

Small copy-paste to explain:

Flags (9 bits) (aka Control bits)
Contains 9 1-bit flags

  NS (1 bit) – ECN-nonce concealment protection (experimental: see RFC 3540).

  CWR (1 bit) – Congestion Window Reduced (CWR) flag is set by the sending host to indicate that it
                received a TCP segment with the ECE flag set and had responded in congestion control
                mechanism (added to header by RFC 3168).

  ECE (1 bit) – ECN-Echo has a dual role, depending on the value of the SYN flag. It indicates:
                If the SYN flag is set (1), that the TCP peer is ECN capable.
                If the SYN flag is clear (0), that a packet with Congestion Experienced flag set (ECN=11) in IP
                header received during normal transmission (added to header by RFC 3168). This serves as
                an indication of network congestion (or impending congestion) to the TCP sender.

  URG (1 bit) – indicates that the Urgent pointer field is significant

  ACK (1 bit) – indicates that the Acknowledgment field is significant. All packets after the initial SYN
                packet sent by the client should have this flag set.

  PSH (1 bit) – Push function. Asks to push the buffered data to the receiving application.

  RST (1 bit) – Reset the connection

  SYN (1 bit) – Synchronize sequence numbers. Only the first packet sent from each end should have this
                flag set. Some other flags and fields change meaning based on this flag, and some are only
                valid for when it is set, and others when it is clear.

  FIN (1 bit) – Last package from sender.
LauraRozier commented 7 years ago

Statistics from one game, 4 players 1 spectator ( Calm night. :P ) Again the same filter applied: tcp.port==56789 and ip.dst==192.168.20.101 Average is: 4724.97 packets per minute

Statistics

Measurement Displayed
Packets 295395 (56.3%)
Time span, s 3751.075
Average pps 78.7
Average packet size, B 76.5
Bytes 22646801 (54.7%)
Average bytes/s 6037
Average bits/s 48 k

NOTE : I have the capture file saved if needed.

EDIT : I was one of the 4 players.

reyandme commented 7 years ago

It would be interesting to see the progression of number of packets and sizes of them (bytes) depending of: 1) number of players 2) number of spectators 3) combinations of them: f.e. will number of packet and their sizes be almost equal for 4 player + 2 spec and for 6 players only?

Also we can use this info to check do we succeed on optimisations or not. I'll to study Wireshark PCap soft to check it by myself later.

Possible way of optimisation 1) using only needed fields of TGameInputCommand record.

TGameInputCommand = record
    CommandType: TGameInputCommandType;
    Params: array[1..MAX_PARAMS]of integer;
    TextParam: UnicodeString;
    DateTimeParam: TDateTime;
    HandIndex: TKMHandIndex; 
  end;

Most of commands (I believe all of them) do not use all fields - some need Params array (or even 1 or 2 integer parameters only), some - TextParam and some DateTimeParam.

But we send and read all of them on every command. So we can send only needed field and read them accodingly depends of command type, which we read first anyway.

2) another small thing I found

TCommandsPack = class
  private
    fCount: Integer;
...

and we send its number as integer

aStream.Write(fCount);

Do we really can have more than 255 commands for 0.1 second? If no - we could use Byte here.

reyandme commented 7 years ago

I've done some measurements with Wireshark while spectating 8+2 game Here it is some statistics:

stats

stats2

98% - small packets, probably empty command packs with no commands. Average Rate - 850 packets per second, up to 1250 pps at peak.

So mainly we need to optimise packets number, instead of packets size

reyandme commented 7 years ago

I've finished network optimisation feature: packets accumulating on server for every client. Packet from client is sent, as it was. But on server we parse packet and save it into buffer (mapped by net client identifier), instead of sending immidiately. Every %delay% ms we pack all messages from buffer into one message and send to every client.

Accumulating delay - from 10 to 100ms, currently testing. But even 10ms is quite good. I tested with dedicated server, 4 players, x3, no player activities:

Stats show network activity between all clients and server in both ways. Measured on dedicated server. before:

img img

after (50 ms accumulating delay):

img img

Its about 40% less average packets per second (pps), which cause 30% less traffic (bytes/s), because most of traffic is transport service data.

Currently all packets are accumulated. But probably it would be better do not accumulate service packets (connetc/disconnect/set_game_info etc) to avoid small lags for that procedures, but accumulate only mk_commands/mk_ping/mk_pong/mk_fps - the most frequent packet types, which takes 99% of all traffic by packets number ant size.

lewinjh commented 7 years ago

Yes accumulating packets is a good way to reduce the total number of packets and the bandwidth. But it is a tradeoff with latency. Everyone's ping will be higher and so the game delay (time between issuing a command and it being executed) will also be higher.

A small accumulation delay like 10ms will probably not be noticeable but could make a difference when you have lots of players spamming packets. 50ms might be ok too, but will probably result in an extra 100ms of game delay.

Accumulation on the client is probably not worthwhile. There are a lot less packets send, and it will add even more delay/latency when accumulating on both the client and server.

I doubt changing a single integer to a byte is going to make any noticeable difference. Most of the bandwidth is probably TCP overhead due to the large number of small packets (you reduced the bandwidth by 12KB/s just by accumulating packets!).

Other optimisations that could be made:

reyandme commented 7 years ago

Accumulation delay should be small, to avoid game delay, as you mentioned. Best way - just test it.

I've done some measurements of sent/received packet types on client recently, it was 4 players game

img

Record length is about 31 sec, as we can guess (started 3 sec before game start and first mk_Commands was sent).

Most of these packets were mk_Commands - about 80% for 4 players. For 8 players + 2 specs should be about 95%. For speed'ed up game - even more.

FPS and ping optimisation should help and I will implement it, it's quite easy. But mk_Commands is the main traffic producer. And most of the command packets are just empty, I believe (have to check it later though).

Here we send mk_Commands

for I := aTick + 1 to aTick + fDelay do
  //If the network is not connected then we must send the commands later (fSent will remain false)
  if (not fSent[I mod MAX_SCHEDULE]) and fNetworking.Connected
  and (fNetworking.NetGameState = lgs_Game) then //Don't send commands unless game is running normally
  begin
    if not fCommandIssued[I mod MAX_SCHEDULE] then
      fSchedule[I mod MAX_SCHEDULE, gGame.Networking.MyIndex].Clear; //No one has used it since last time through the ring buffer
    fCommandIssued[I mod MAX_SCHEDULE] := False; //Make it as requiring clearing next time around

    fLastSentTick := I;
    SendCommands(I);
    fSent[I mod MAX_SCHEDULE] := true;
    fRecievedData[I mod MAX_SCHEDULE, gGame.Networking.MyIndex] := True; //Recieved commands from self
  end;

Possible optimisation could be - do not send empty commands immidiately. We can mark them as 'ready to send' but wait until get fDelay number of them (from 2 to 32) or get one commands pack with some command inside. Then send them packed.

It should not bring any extra latency or lags but could significally reduce number of packets from all clients.

Kromster80 commented 7 years ago

Take note, that empty commands are still the same "first class citizen" as not empty commands and acks - they all are essential for synchronous game simulation for all players. You can not prioritize them one over another.

Kromster80 commented 7 years ago

As for optimizations, we've also discussed that player could bundle his commands into a single packet e.g. once every 50ms (that's 1 command + 8 acks + TCP overhead in 8p x2 game) and let server unpack it and send out to destination players (again in packs made every 50ms, thats 8 commands + 8 acks + TCP overhead). That would drastically reduce packet count. But do we want to put that load on the server?

reyandme commented 7 years ago

Take note, that empty commands are still the same "first class citizen" as not empty commands and acks - they all are essential for synchronous game simulation for all players. You can not prioritize them one over another.

Ok, that basically mean we can't do much about it.

As for optimizations, we've also discussed that player could bundle his commands into a single packet e.g. once every 50ms (that's 1 command + 8 acks + TCP overhead in 8p x2 game)

We sends acks (called random_check) every 10 ticks, so proportion between commands and tick in 8p2sp game is about 1:1. Bundle commands on client side will not help much than, as @lewinjh mentioned - on average there will be 1 command+1ack+0.1other packet, but we will get 50ms extra latency and, as you said, extra load on the server. On server side there could be much many packs to bundle on the same 50ms. What about load on server - why it should be so big load? Server just save packets in memory for 50ms.

Kromster80 commented 7 years ago

We sends acks (called random_check) every 10 ticks

No, I was referring to acks (confirmations) on every command (that player got the commands for the tick and it's safe to simulate game past it). CRC checks are a different thing. It seems I'm wrong though - we don't send those and let TCP sort it out (right?)

Btw, does the measured packet rate match the theory?

reyandme commented 7 years ago

No, I was referring to acks (confirmations) on every command (that player got the commands for the tick and it's safe to simulate game past it). CRC checks are a different thing. It seems I'm wrong though - we don't send those and let TCP sort it out (right?)

No, we do not send acks. Only commands and those CRC checks.

Btw, does the measured packet rate match the theory?

Let take my stats for 4 player game, x3: on every second every player produce: 30 mk_commands + 3 random checks +1 fps + 1 pong = 35 packets 4 players = 140 pps. We sent them to 3 other players, than all together pps (including sent and received packets) 1404 = 560pps + few other packes (pinginfo ~0.5pps per player = 2 pps) ~563pps.

On stats it was 576pps - very close.

But then for 8+2 game (lets say it was x2 game), that I measured before - it should be about 2500, when it was 850 actually. upd 850 was measured on client, while 576 - on server

Do we need to take TCP ack's into account?

Kromster80 commented 7 years ago

IIRC we send some packets on p2p basis, but some packets get sent to server and then there get broadcasted to all players. So we can not just do 35 * 4.

Why I'm focusing on this - we need to be sure that theory matches facts, so we know the theory is correct (and there's no some bugs that cause ppm increase)

reyandme commented 7 years ago

@Kromster80

IIRC we send some packets on p2p basis, but some packets get sent to server and then there get broadcasted to all players. So we can not just do 35 * 4.

No, every packet is dispached by server. We can say p2p only in logical way, but technically all packets are going throught server.

@lewinjh

I doubt changing a single integer to a byte is going to make any noticeable difference.

Actually it makes noticeable difference. I've just tested it and there was ~1kb/s difference (27kb/s -> 26 kb/s).

before: average = 2071285b/74.224s = 27905 b/s

img

after (type of only 1 variable changed from Integer to Byte): average = 1942540b / 72.283s = 26874 b/s

img2

And its easy to understand - this variable ( TCommandsPack.fCount ) is almost in every packet. Before it was 4 bytes, after its 1. We can calc difference as 350pps*3(b per packet) = 1050b/s.

I doubt we can get 256 commands for 0.1 second so changing fCount to Byte should not make any problems.

mk_FPS could be sent only to the server, recorded by the server, and then sent in the existing mk_PingInfo packet. It's inefficient have the clients broadcast their FPS to all other clients. For extra efficiency you could bundle the FPS measurement inside the mk_Pong command and remove mk_FPS.

I tried both options, that you suggested. First one is good, I haven't notice any difference with what we have now. But if we will update FPS via mk_Pong command, then sometimes delay between actual fps and its value may become couple of seconds. With mk_pong we can get ~1s old FPS, then it will be return with pinginfo up to another ~1s, 2s together. Also we are not going to get much out of this improvement, so I think its better to use 1st option you suggested.

upd I tried to reduce header of our package: it consists of 3 4-byte values: sender ID, recipient ID and message length. I reduced length to Word, as we have our own restriction to 20kb for packets size. Also its possible to reduce Client IDs from Integer to SmallInt - its still a lot.

average: ~25628b/s

img3

upd 2 I introduced new type TKMNetHandleIndex = SmallInt for server indexes (sender/recipient) and replaced everywhere with it. That was a lot of replaces. But it saves 4 bytes in header of absolutely every packet. average: ~24279b/s

img4

Altogether from 39kb/s to 24kb/s - about 40% less traffic!

reyandme commented 7 years ago

Yesterday we have done testing on real enviroment with 11 different players. I tested with server packets accumulation delay from 5 to 50 ms. Players did not notice any difference in gameplay for different delays. Ping was about ~60-120ms.

Here are the results, x3 game speed, measured on dedicated server. All tests were about 30sec, because values does not change much on a longer period of time.

Delay, ms PPS KB/s
5 1575 127
10 1572 126
15 1561 125
20 1053 98
30 1045 97
40 847 85
50 766 81

Here it looks like 20ms is the best option.

Also I was thinking if we can set this parameter, depending of lobby - for fewer number of players/game speed we can set delay to 5ms or just do not use it at all, and when more players want to play - we can set it up to 20-30ms. 40 to 50ms still could be noticable for players IMO.

Not sure why 5-15ms have close numbers, same as for 20-30ms. May be 5-15 ms is not enought to catch significant amount of packets from different players, so its almost peer-server-peer for everybody (same as for r6720).

Unfortunally I forget to test r6720 with let's say 10 players. But probably will do it in future. But even here we can see huge difference between 5 and 50 ms. Or even 5 and 20. More important the scalability is much better in that topology.