Open reyandme opened 7 years ago
I also encourage to playtest actual number of packets that gets sent (in case we overlooked something).
Got a small Wireshark PCap here with this filter: tcp.port==56789 and ip.dst==192.168.20.101 DropBox link
Not that many tbh
EDIT : The PCap is from me connecting to a server, starting a game with one AI, placing a school and tavern, then waiting one minute-ish
This capture is meaningless - iirc AI does not send commands through GIP, and 2 players is nothing for Net. Please try something along the lines of 8 players + 2 spectators from different PCs
Also please see if you can "condense" captured results into some easy-to-view form, alike "NNNN packets per minute", instead of a binary file that can be opened with some specific tool.
Sounds like fun. Will see what I can give. With or without the lobby communication?
EDIT : Would you prefer the current release or the most recent commit?
The colors have to do with the flags set in the packets. Red = RST Grey = SYN or FIN ( There are more per color, but these fit this example )
Small copy-paste to explain:
Flags (9 bits) (aka Control bits)
Contains 9 1-bit flags
NS (1 bit) – ECN-nonce concealment protection (experimental: see RFC 3540).
CWR (1 bit) – Congestion Window Reduced (CWR) flag is set by the sending host to indicate that it
received a TCP segment with the ECE flag set and had responded in congestion control
mechanism (added to header by RFC 3168).
ECE (1 bit) – ECN-Echo has a dual role, depending on the value of the SYN flag. It indicates:
If the SYN flag is set (1), that the TCP peer is ECN capable.
If the SYN flag is clear (0), that a packet with Congestion Experienced flag set (ECN=11) in IP
header received during normal transmission (added to header by RFC 3168). This serves as
an indication of network congestion (or impending congestion) to the TCP sender.
URG (1 bit) – indicates that the Urgent pointer field is significant
ACK (1 bit) – indicates that the Acknowledgment field is significant. All packets after the initial SYN
packet sent by the client should have this flag set.
PSH (1 bit) – Push function. Asks to push the buffered data to the receiving application.
RST (1 bit) – Reset the connection
SYN (1 bit) – Synchronize sequence numbers. Only the first packet sent from each end should have this
flag set. Some other flags and fields change meaning based on this flag, and some are only
valid for when it is set, and others when it is clear.
FIN (1 bit) – Last package from sender.
Statistics from one game, 4 players 1 spectator ( Calm night. :P )
Again the same filter applied: tcp.port==56789 and ip.dst==192.168.20.101
Average is: 4724.97 packets per minute
Measurement | Displayed |
---|---|
Packets | 295395 (56.3%) |
Time span, s | 3751.075 |
Average pps | 78.7 |
Average packet size, B | 76.5 |
Bytes | 22646801 (54.7%) |
Average bytes/s | 6037 |
Average bits/s | 48 k |
NOTE : I have the capture file saved if needed.
EDIT : I was one of the 4 players.
It would be interesting to see the progression of number of packets and sizes of them (bytes) depending of: 1) number of players 2) number of spectators 3) combinations of them: f.e. will number of packet and their sizes be almost equal for 4 player + 2 spec and for 6 players only?
Also we can use this info to check do we succeed on optimisations or not. I'll to study Wireshark PCap soft to check it by myself later.
Possible way of optimisation 1) using only needed fields of TGameInputCommand record.
TGameInputCommand = record
CommandType: TGameInputCommandType;
Params: array[1..MAX_PARAMS]of integer;
TextParam: UnicodeString;
DateTimeParam: TDateTime;
HandIndex: TKMHandIndex;
end;
Most of commands (I believe all of them) do not use all fields - some need Params array (or even 1 or 2 integer parameters only), some - TextParam and some DateTimeParam.
But we send and read all of them on every command. So we can send only needed field and read them accodingly depends of command type, which we read first anyway.
2) another small thing I found
TCommandsPack = class
private
fCount: Integer;
...
and we send its number as integer
aStream.Write(fCount);
Do we really can have more than 255 commands for 0.1 second? If no - we could use Byte here.
I've done some measurements with Wireshark while spectating 8+2 game Here it is some statistics:
98% - small packets, probably empty command packs with no commands. Average Rate - 850 packets per second, up to 1250 pps at peak.
So mainly we need to optimise packets number, instead of packets size
I've finished network optimisation feature: packets accumulating on server for every client. Packet from client is sent, as it was. But on server we parse packet and save it into buffer (mapped by net client identifier), instead of sending immidiately. Every %delay% ms we pack all messages from buffer into one message and send to every client.
Accumulating delay - from 10 to 100ms, currently testing. But even 10ms is quite good. I tested with dedicated server, 4 players, x3, no player activities:
Stats show network activity between all clients and server in both ways. Measured on dedicated server. before:
after (50 ms accumulating delay):
Its about 40% less average packets per second (pps), which cause 30% less traffic (bytes/s), because most of traffic is transport service data.
Currently all packets are accumulated. But probably it would be better do not accumulate service packets (connetc/disconnect/set_game_info etc) to avoid small lags for that procedures, but accumulate only mk_commands/mk_ping/mk_pong/mk_fps - the most frequent packet types, which takes 99% of all traffic by packets number ant size.
Yes accumulating packets is a good way to reduce the total number of packets and the bandwidth. But it is a tradeoff with latency. Everyone's ping will be higher and so the game delay (time between issuing a command and it being executed) will also be higher.
A small accumulation delay like 10ms will probably not be noticeable but could make a difference when you have lots of players spamming packets. 50ms might be ok too, but will probably result in an extra 100ms of game delay.
Accumulation on the client is probably not worthwhile. There are a lot less packets send, and it will add even more delay/latency when accumulating on both the client and server.
I doubt changing a single integer to a byte is going to make any noticeable difference. Most of the bandwidth is probably TCP overhead due to the large number of small packets (you reduced the bandwidth by 12KB/s just by accumulating packets!).
Other optimisations that could be made:
Accumulation delay should be small, to avoid game delay, as you mentioned. Best way - just test it.
I've done some measurements of sent/received packet types on client recently, it was 4 players game
Record length is about 31 sec, as we can guess (started 3 sec before game start and first mk_Commands was sent).
Most of these packets were mk_Commands - about 80% for 4 players. For 8 players + 2 specs should be about 95%. For speed'ed up game - even more.
FPS and ping optimisation should help and I will implement it, it's quite easy. But mk_Commands is the main traffic producer. And most of the command packets are just empty, I believe (have to check it later though).
Here we send mk_Commands
for I := aTick + 1 to aTick + fDelay do
//If the network is not connected then we must send the commands later (fSent will remain false)
if (not fSent[I mod MAX_SCHEDULE]) and fNetworking.Connected
and (fNetworking.NetGameState = lgs_Game) then //Don't send commands unless game is running normally
begin
if not fCommandIssued[I mod MAX_SCHEDULE] then
fSchedule[I mod MAX_SCHEDULE, gGame.Networking.MyIndex].Clear; //No one has used it since last time through the ring buffer
fCommandIssued[I mod MAX_SCHEDULE] := False; //Make it as requiring clearing next time around
fLastSentTick := I;
SendCommands(I);
fSent[I mod MAX_SCHEDULE] := true;
fRecievedData[I mod MAX_SCHEDULE, gGame.Networking.MyIndex] := True; //Recieved commands from self
end;
Possible optimisation could be - do not send empty commands immidiately. We can mark them as 'ready to send' but wait until get fDelay number of them (from 2 to 32) or get one commands pack with some command inside. Then send them packed.
It should not bring any extra latency or lags but could significally reduce number of packets from all clients.
Take note, that empty commands are still the same "first class citizen" as not empty commands and acks - they all are essential for synchronous game simulation for all players. You can not prioritize them one over another.
As for optimizations, we've also discussed that player could bundle his commands into a single packet e.g. once every 50ms (that's 1 command + 8 acks + TCP overhead in 8p x2 game) and let server unpack it and send out to destination players (again in packs made every 50ms, thats 8 commands + 8 acks + TCP overhead). That would drastically reduce packet count. But do we want to put that load on the server?
Take note, that empty commands are still the same "first class citizen" as not empty commands and acks - they all are essential for synchronous game simulation for all players. You can not prioritize them one over another.
Ok, that basically mean we can't do much about it.
As for optimizations, we've also discussed that player could bundle his commands into a single packet e.g. once every 50ms (that's 1 command + 8 acks + TCP overhead in 8p x2 game)
We sends acks (called random_check) every 10 ticks, so proportion between commands and tick in 8p2sp game is about 1:1. Bundle commands on client side will not help much than, as @lewinjh mentioned - on average there will be 1 command+1ack+0.1other packet, but we will get 50ms extra latency and, as you said, extra load on the server. On server side there could be much many packs to bundle on the same 50ms. What about load on server - why it should be so big load? Server just save packets in memory for 50ms.
We sends acks (called random_check) every 10 ticks
No, I was referring to acks (confirmations) on every command (that player got the commands for the tick and it's safe to simulate game past it). CRC checks are a different thing. It seems I'm wrong though - we don't send those and let TCP sort it out (right?)
Btw, does the measured packet rate match the theory?
No, I was referring to acks (confirmations) on every command (that player got the commands for the tick and it's safe to simulate game past it). CRC checks are a different thing. It seems I'm wrong though - we don't send those and let TCP sort it out (right?)
No, we do not send acks. Only commands and those CRC checks.
Btw, does the measured packet rate match the theory?
Let take my stats for 4 player game, x3: on every second every player produce: 30 mk_commands + 3 random checks +1 fps + 1 pong = 35 packets 4 players = 140 pps. We sent them to 3 other players, than all together pps (including sent and received packets) 1404 = 560pps + few other packes (pinginfo ~0.5pps per player = 2 pps) ~563pps.
On stats it was 576pps - very close.
But then for 8+2 game (lets say it was x2 game), that I measured before - it should be about 2500, when it was 850 actually. upd 850 was measured on client, while 576 - on server
Do we need to take TCP ack's into account?
IIRC we send some packets on p2p basis, but some packets get sent to server and then there get broadcasted to all players. So we can not just do 35 * 4.
Why I'm focusing on this - we need to be sure that theory matches facts, so we know the theory is correct (and there's no some bugs that cause ppm increase)
@Kromster80
IIRC we send some packets on p2p basis, but some packets get sent to server and then there get broadcasted to all players. So we can not just do 35 * 4.
No, every packet is dispached by server. We can say p2p only in logical way, but technically all packets are going throught server.
@lewinjh
I doubt changing a single integer to a byte is going to make any noticeable difference.
Actually it makes noticeable difference. I've just tested it and there was ~1kb/s difference (27kb/s -> 26 kb/s).
before: average = 2071285b/74.224s = 27905 b/s
after (type of only 1 variable changed from Integer to Byte): average = 1942540b / 72.283s = 26874 b/s
And its easy to understand - this variable ( TCommandsPack.fCount ) is almost in every packet. Before it was 4 bytes, after its 1. We can calc difference as 350pps*3(b per packet) = 1050b/s.
I doubt we can get 256 commands for 0.1 second so changing fCount to Byte should not make any problems.
mk_FPS could be sent only to the server, recorded by the server, and then sent in the existing mk_PingInfo packet. It's inefficient have the clients broadcast their FPS to all other clients. For extra efficiency you could bundle the FPS measurement inside the mk_Pong command and remove mk_FPS.
I tried both options, that you suggested. First one is good, I haven't notice any difference with what we have now. But if we will update FPS via mk_Pong command, then sometimes delay between actual fps and its value may become couple of seconds. With mk_pong we can get ~1s old FPS, then it will be return with pinginfo up to another ~1s, 2s together. Also we are not going to get much out of this improvement, so I think its better to use 1st option you suggested.
upd I tried to reduce header of our package: it consists of 3 4-byte values: sender ID, recipient ID and message length. I reduced length to Word, as we have our own restriction to 20kb for packets size. Also its possible to reduce Client IDs from Integer to SmallInt - its still a lot.
average: ~25628b/s
upd 2 I introduced new type TKMNetHandleIndex = SmallInt for server indexes (sender/recipient) and replaced everywhere with it. That was a lot of replaces. But it saves 4 bytes in header of absolutely every packet. average: ~24279b/s
Altogether from 39kb/s to 24kb/s - about 40% less traffic!
Yesterday we have done testing on real enviroment with 11 different players. I tested with server packets accumulation delay from 5 to 50 ms. Players did not notice any difference in gameplay for different delays. Ping was about ~60-120ms.
Here are the results, x3 game speed, measured on dedicated server. All tests were about 30sec, because values does not change much on a longer period of time.
Delay, ms | PPS | KB/s |
---|---|---|
5 | 1575 | 127 |
10 | 1572 | 126 |
15 | 1561 | 125 |
20 | 1053 | 98 |
30 | 1045 | 97 |
40 | 847 | 85 |
50 | 766 | 81 |
Here it looks like 20ms is the best option.
Also I was thinking if we can set this parameter, depending of lobby - for fewer number of players/game speed we can set delay to 5ms or just do not use it at all, and when more players want to play - we can set it up to 20-30ms. 40 to 50ms still could be noticable for players IMO.
Not sure why 5-15ms have close numbers, same as for 20-30ms. May be 5-15 ms is not enought to catch significant amount of packets from different players, so its almost peer-server-peer for everybody (same as for r6720).
Unfortunally I forget to test r6720 with let's say 10 players. But probably will do it in future. But even here we can see huge difference between 5 and 50 ms. Or even 5 and 20. More important the scalability is much better in that topology.
Currently max number of players+spectators is 10. When we tried to add more spectators/players it was unplayable because of lags/all time disconnections etc.
Probable network problems: 1) Spectators are also generating traffic for all players, so adding 1 spectator means it will send gic commands every tick to every player even if he just spectating and can not do anything on the map. AFAIK these empty commands packets needs to understand if spectator was disconnected or not.
If it is true, then we need to remake this mechanism, so server itself could determine who is disconnected by some other special packets (only transfered to server, not to all players), then notify players if someone disconnected. Or may be by some other mechanism (@lewinjh @Kromster80 your suggestions ?) And do not send mk_Command to all players then.
So adding 1 spectator will increase number of packets linearly: +1 spec = +N commands packets to him, +2 spec = +2N packet to specs etc etc.
2) Server just transfer commands, but could do some optimizations.
For example server can accumulate packets for every 100ms from all source-players and then transfer them to destination-player.