factoriolib / flib

A set of high-quality, commonly-used utilities for creating Factorio mods.
https://mods.factorio.com/mod/flib
MIT License
67 stars 15 forks source link

Translations seem to cause trouble in multiplayer, disconnecting joining player #45

Closed Bilka2 closed 2 years ago

Bilka2 commented 2 years ago

See https://forums.factorio.com/viewtopic.php?p=569067#p569067 and possibly the related topic linked in the posts in the topic.

raiguard commented 2 years ago

Yeah, this is a known issue. I have some ideas on how to fix it, but I am unable to replicate the issue myself so I can't properly test it. Once I get my laptop in August I'll be able to properly test it and fix it.

Hornwitser commented 2 years ago

The issue is fairly simple. You translate some 50 strings in one tick by default. This gets sent in a single tick as a 30kB large localized string with a 30kB large result payload:

image

This is then packed into a 60kB heartbeat that is then split into 125 fragments of 516 bytes UDP packets that are all sent at once (i.e. at line rate):

image

Over my loopback it's going at an instantaneous transfer rate of ~550 Mbit/s, though there's no reason a Gbit/s ethernet controller couldn't send it at 1 Gbit/s. If this was going over the internet instead of loopback this packet burst would likely hit a modem with a significantly slower line rate than 1 Gbit/s and overflow its receive/send buffer and cause one or more of the fragments to be lost. If any part of a fragmented heartbeat is lost then the heartbeat is resent in full as there is no per fragment resend mechanism for heartbeats, and if every time it's resent it causes the same buffer to overflow and drop the same fragments then the client times out as it'll never receive all the part of the fragmented heartbeat.

To put it into perspective on how grossly out of line it is to send 60 kB of data in one tick, before 0.18.7 all windows clients would drop out of the game if you sent more than 8kB of data in two ticks. The current translation request size is 10x higher than what is reasonable for the protocol.

raiguard commented 2 years ago

Thanks for the detailed analysis @Hornwitser. I have decided to remove the nesting entirely, and do a hardcoded five strings per translation. I don't have the tools to test how much network traffic this generates, but I'm going to send it to a few people to see if it solves their issues.