godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
89.84k stars 20.98k forks source link

Websocket dropping data violates TCP reliability and congestion control #70738

Open xix-xeaon opened 1 year ago

xix-xeaon commented 1 year ago

Godot version

v4.0.beta10.official [d0398f62f]

System information

Linux

Issue description

When the socket is being fed data (receiving) faster than your game can deal with it the WebSocketPeer poll-call will continue to read data despite the buffer filling up and then be forced to drop the data. This is not correct behavior for any application working with any protocol based on TCP.

This is not related to #22496. If you're receiving websocket messages that are very large, then obviously you'll need to have a buffer large enough to fit such messages. Instead, this is regarding "normal" sized messages but a frame freeze or other performance reducing condition causing the websocket buffer to fill up when it eventually gets to the poll-call.

Dropping data violates the TCP reliability upon which websocket is built which means it's no longer possible to rely on data sent actually being received, causing one to have to implement data re-transmission mechanisms despite those already being present in TCP.

It also violates the TCP congestion control which is supposed to prevent too much data being sent in the first place, when the receiver (or network) is too slow to handle it. This happens because from the point of view of TCP the data was correctly delivered - the drop is happening in the application layer!

When polling, the WebSocketPeer should only read at most as much data as will actually fit in the remaining buffer. If the game is unable to consume the messages in time, and the buffer becomes full then no more data should be read. No data should ever be dropped at the application layer.

Now, because no more data is being read, the TCP buffers in the operating system will fill up instead and TCP will be the one to drop the data. This is a good thing because that means the dropped data will not be Acknowledged to the sender, and the sender will re-transmit it - preventing data-loss if the receiver was slow only temporarily (usually the case). Data is now sent reliably again.

However, if the receiver continues to be "slow" (compared to the amount of data being sent) then the senders TCP buffers will fill up as well thus preventing the sending application from sending more data than the receiver (or network) can handle. This means that the congestion control is working.

A naive sender might get stuck in blocking calls, or fill it's buffer/RAM and crash. But a properly designed sender will adjust the rate it's sending data correctly. A game might reduce the network tick rate, reduce the view distance or send updates about only half the players at each tick, etc.

Currently, however, the sender can't tell that the receiver is overwhelmed and will continue overloading the receiver, and the receiver will be the one to get stuck waiting for data that was dropped but never resent, or crash because a message is referring to data from a dropped message.

Steps to reproduce

Have something (like rendering) make the game too slow while receiving "a lot" of data at the same time.

I discovered this issue first because the client received spawn messages from the server and froze while instancing entities (shader compilation). During the freeze, more messages accumulated and afterwards the buffer of course filled when the poll-call tried to read all of them at once.

I've also seen it when going full screen with too much rendering quality, or using the movie writer.

Minimal reproduction project

WebDrop.zip

This project relies on a large number of messages being sent at once to simulate the issue. It's only to simplify and reliably demonstrate the issue. In a real case the rendering and everything else taking up frame time can cause the same issue with many fewer messages.

The websocket server is in Python because apparently (looking at the examples) a websocket server in Godot 4 is quite complicated.

chrisl8 commented 8 months ago

I'm debating whether I need to watch this issue or open a new one, because this brings up another issue:

"reliable" RPCs will just fail to arrive when this issue occurs. Isn't Godot supposed to deal with the acknowledgement and resending of "reliable" RPCs regardless of the underlying network transport layer?

Is Godot relying on TCP for this when using Websocket?

Calinou commented 8 months ago

Is Godot relying on TCP for this when using Websocket?

It has to, because WebSockets is a TCP-only protocol. There's no UDP connection opened for it, unlike WebRTC.

chrisl8 commented 8 months ago

Is Godot relying on TCP for this when using Websocket?

It has to, because WebSockets is a TCP-only protocol. There's no UDP connection opened for it, unlike WebRTC.

Appolgies, I was not precise in my language.

Is Godot relying on TCP's reliability instead of sending its own acknowledgement packets above TCP at the application layer when using Websocket?

I ask, because "reliable" RPCs are entirely lost when the mentioned error happens, which I would not expect to happen if Godot engine was already waiting for acknowledgement of sent "reliable" RPCs at the application layer and resending when required.

This issue has caused me to have to create my own code to make RPCs reliable, which seems quite redundant if both the Godot Engine and TCP are supposed to already do that, and yet I must.

agorangetek commented 6 months ago

I think WebsocketPeer.send() should actually return a vector2, the amount actually sent and an error if any.