godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
89.4k stars 20.24k forks source link

Mysterious reproducible connection_failed for MultiplayerAPI using WebSocketClient #42210

Closed goatchurchprime closed 3 years ago

goatchurchprime commented 4 years ago

Godot version: 3.2.3-stable

OS/device including version: Windows 10

Issue description: After a few days of having problems with ENet library on the Oculus Quest, I tried to convert to using Websockets.
Unfortunately the client refused to connect even when running on the same PC. However, I discovered that the stripped down demo versions of the client side Websocket networking would connect.

I have bisected my project many times and got it down to this limited project. (Apologies for the few functions that I forgot to delete, but there are now only 2 scripts, one of them is only a few lines.)

The blocker now seems to be deleting a loading environment, or deleting a simplex noise texture, neither of which have anything to do with networking. I can't see how to probe this further.

It would be useful if the connection_failed signal provided some further information as to why it failed.

Steps to reproduce: Open this project and select Spatial, see that Hostipnumber is blank and Enablenetworking and Usewebsockets are both On.

Now run. The messages you get are:

Websocketserverclient listen: 0 networkID: 1

Now run a second copy of this project, setting Hostipnumber: 127.0.0.1

(You also need to do Editor -> Editor Settings -> Network -> Debug and change the Remote port to something other than 6007)

The messages are:

Websocketclient connectto: ws://127.0.0.1:8002 0 networkID: 0 _connection_failed

However if you comment out the line:

var a = load("res://environments/underground_env.tres") in the script attached to the PlayerMe node, then:

Then it works:

networkID: 0 _player_connected 1 _connected_to_server

And meanwhile in the output window of the first copy:

_player_connected 1314515187 _player_disconnected 1314515187

Uncomment the load() line to prove that it doesn't work again. Now also if you delete the SpatialMaterial in the WorldEnvironment->rockwater mesh or the MaterialSystem->bluewater mesh it also works.

Minimal reproduction project: tunnelvrScratch3b.zip

Calinou commented 4 years ago

The blocker now seems to be deleting a loading environment, or deleting a simplex noise texture, neither of which have anything to do with networking. I can't see how to probe this further.

Maybe this is due to a race condition if threads are used down the line?

Faless commented 4 years ago

You are getting a disconnect because the client takes too long to do the handshake (timeout is 1 sec during websocket handshake to limit DOS attacks). Adding print(OS.get_ticks_msec()) to both _ready and _process I see there is a 1.2 second delay between the _ready and the first _process. I can suggest doing the connection after the first _process has run (via a button, or a call_deferred).

goatchurchprime commented 4 years ago

Yikes. At least the pure TCP links get a 30 second timeout. I'm not sure this has anything to do with DOS attacks. Even if it were it's so far down the line before anything like that becomes relevant. An app under development is going to spend on the order of months or years before there are more than a few copies running at the same time on the net.

Is this 1sec the time limit between invoking the poll() function? It still would be nice to have a logged error explaining when this is why the system dropped the connection.

I have moved the websocketserverclient.connect_to_url() function from _ready() into the _process() (initialized once) where it then gets the poll() function called on it.

There are a couple of other irregularities that are preventing the Websockets being used as a drop-in substitution for NetworkedMultiplayerENet. These are:

The pre_configure_game() of https://docs.godotengine.org/en/stable/tutorials/networking/high_level_multiplayer.html has us to rename the player node to the unique_id so that it works as a peer, and getting this number early would tidy things up.

Faless commented 4 years ago

An app under development is going to spend on the order of months or years before there are more than a few copies running at the same time on the net.

This is about a single malicious user, opening a stale TCP connection, sending no request but taking up the space of a pending connection to the WebSocketServer (somehow like the slow R.U.D.Y attack in Mr. Robot if you saw it).

Is this 1sec the time limit between invoking the poll() function?

No, it's the maximum time the server allow to pass before a pending connection that has not yet sent the websocket handshake is considered stale. After the handshake, the regular 30 secs are used.

* You don't get a valid non-zero code from `get_network_unique_id()`

Because the ID is assigned by the server, so if you are not connection, you can't know your ID.

* The signal `_player_connected()` gets called just before `_connected_to_server()`, which seems the wrong way round.

Don't rely too much on _connected_to_server, you will be connected, but you might not yet have the information on all the peers (more peer_connected events will be called after that, signalling the other peers connected to the server).

and getting this number early would tidy things up.

As said, this can't be done, because it's assigned by the server.

goatchurchprime commented 4 years ago

It appears that with the ENet system the unique ID must be created by the client, because if I run this code with no server, then it prints out a number like networkID: 479084525

var networkedmultiplayerenet = NetworkedMultiplayerENet.new()
networkedmultiplayerenet.create_client(hostipnumber, hostportnumber)
get_tree().set_network_peer(networkedmultiplayerenet)
networkID = get_tree().get_network_unique_id()
print("networkID is ", networkID)

The odds are good that there won't be a clash.

The point of the multiplayer system is to use rpc() function calls between peers that are associated by virtue of having the same node names.

What is the recommended time to set these nodepath names for the different player nodes -- in particular the node representing me, the client player, on the client's computer? If it's not after the create_client() call, then it must be at the _connected_to_server() signal when we finally receive the true networkID.

Unfortunately the function calls are out of order. Not only is _player_connected({server's ID}) arriving before _connected_to_server(), but I also get rpc's coming in before I know my networkID as well. This is problematic because the server can be programmed to call-back the with information for the client when it gets its own _player_connected({client's ID}) function called by when it sees the client connecting, and it sets up the node corresponding to it with the name of its ID, and then does something like:

clientnodeonserver.rpc_id({client's ID}, "setplayerposition", {Vector})

This doesn't work because the _connected_to_server() is late, and the client's player-node is not yet set to the client's ID. If the _connected_to_server signal isn't going to arrive at the right time -- ie before anything happens as a result of being connected to the server -- then it can't be used for anything. Instead I have to maintain an uninitialized flag in my code that watches for the first _player_connected or incoming rpc and then calls the code I would have liked _connected_to_server to have done.

Anyway, this is just my opinion for how the order of things should happen (network-ids and client_connected signals). It is far more important that they be the same for NetworkedMultiplayerENet library as they are for the Websockets library so that if I get my code working for one, then I can easily swap it out for the other. From my point of view it is very difficult to know which of these protocols is most appropriate, so being in a position to run my code with one and then the other is the best way to experiment with them.

Faless commented 4 years ago

It appears that with the ENet system the unique ID must be created by the client, because if I run this code with no server, then it prints out a number like ``

Yeah, and that was originally an issue, that was mitigated by adding more checks on the server. Still, it's the server that should assign the ID. So I guess in 4.0 it will be changed to behave like WebSocket

What is the recommended time to set these nodepath names for the different player nodes -- in particular the node representing me, the client player, on the client's computer? If it's not after the create_client() call, then it must be at the _connected_to_server() signal when we finally receive the true networkID.

Every time you received a peer_connected you can configure that peer. After you receive connection_succeeded or peer_connected(1) (ID=1, server), your ID will also be set.

Anyway, this is just my opinion for how the order of things should happen (network-ids and client_connected signals).

Yeah, IDs assignment in ENet should be done by the server (although it's a minor thing, since there are server checks in place anyway, but still, need to do that for consistency). On the other end, the order seems the same for me. First peer_connected(1), then connection_succeeded, then the rest of peer_connected.

Faless commented 3 years ago

Closing via #49964 and #49966