Unity-Technologies / com.unity.netcode.gameobjects

Netcode for GameObjects is a high-level netcode SDK that provides networking capabilities to GameObject/MonoBehaviour workflows within Unity and sits on top of underlying transport layer.
MIT License
2.15k stars 434 forks source link

Server NetworkManager does not accept new connections from new clients if ONLY/FIRST client in game closes browser (webgl) #3117

Open kevinstriker opened 5 days ago

kevinstriker commented 5 days ago

Description

if the last/only player in a Multiplay Server setup closes their browser to quit the game. The server does not accept new clients joining the game anymore...

Reproduce Steps

  1. Open the project (Unity 2022.3.x i use 8)
  2. Create a Project on Unity Services with: Multiplay Hosting, Lobby, Relay, Player Authentication, Matchmaker In Unity create a dedicated server build (keep aware of the name, you need it in Build Configuration later). Create a Build in Unity Services -> Multiplayer Hosting -> Buiids. By uploading the files you build from unity. After create build configuration, use the name of the build and search for xxx.x86_64 as the Game server executeable. -nographics -port $$port$$ -queryport $$query_port$$ -logFile $$log_dir$$/Engine.log create a fleet with build config and build and a region.
  3. Create a Matchmaker queue with name "WebFPSFFA" and a default pool choose the fleet. build config etc. and region, choose region of the fleet.
  4. Create a WebGL build (upload to itch or anywhere with https needed most likely)
  5. Spin up a game by starting the WebGL game -> main menu -> Lobby Browser -> Create game
  6. Keep in mind in might take 3-4 mins time when a new server needs to be spinned up, you can prevent this by having a min server available of 1 in the fleet. However there are (small) costs to this. Don't forget to close the min servers available after to 0
  7. Leave the game by closing the browser. Reopen browser and re-open the game, try to join same game (by lobby join) This will cause an exception on the StartClient()
  8. Ensure there are NO other players in the game, otherwise the bug doesn't happen.

Actual Outcome

StartClient() does not succeeds and time’s out.

Expected Outcome

New players should still be able to join the server by .StartClient().

Environment

Screenshot

Here you can see that there is no new Connection event happening after the last player closes the game by closing the browser. (10-30 seconds later the server kicks him out of the game, connectedClients = 0 after). And after that joining the game from Editor, webgl or any other instance does not work anymore Screenshot 2024-11-07 at 09 46 43

Additional Context

See ZIP project. NGO.zip

NoelStephensUnity commented 4 days ago

@kevinstriker My first step was to just see how everything worked running the dedicated server locally (had to comment out the MultiplayService.Instance.StartServerQueryHandlerAsync to do this) so I could just see it running using standard UDP via UTP.

The one thing that may or may not be causing issues is that it seems the player's are never removed from the lobby upon disconnecting (not sure if this is intentional or not). image The above screenshot was after connecting and disconnecting 4 times to the same session. Of course, this is not using WebSockets but I wanted to make sure that using Relay, Lobby, and a locally running dedicated server functioned as expected prior to digging deeper.

The next test is to enable Websockets and see if I get similar results where the server continues to accept connections and if not then determine if this is a Websocket + Multiplayer services issue (or the like). From an NGO perspective, it looks like the server isn't running into any issues.

The "Failed to connect to server." message on the client side is basically saying that either:

If using Websockets works running a dedicated server locally on my system, then I will run through the dedicated side of things to try and replicate the issue...and if so I will need to get the services folks involved to help troubleshoot/narrow down what the cause could be.

NoelStephensUnity commented 4 days ago

@kevinstriker So, before I get too much further into this... I just noticed something:

The MainMenu (Client) scene contains a NetworkManager instance: image

The Server scene contains a NetworkManager instance: image

Then it appears the PrototypeMap (loaded by both of the above scenes) contains a NetworkManager instance: image

You should only have 1 NetworkManager instance and loading a scene with a NetworkManager instance will override the NetworkManager.Singleton and I noticed the one in the PrototypeMap doesn't have Websockets enabled... this could potentially be the issue.

Can you replicate this issue if you remove the NetworkManager from the PrototypeMap?

kevinstriker commented 4 days ago

Hi @NoelStephensUnity

First of all BIG thanks for getting back to me.

Yes apologies, I do turn off the NetworkManager gameobject in the PrototypeMap. The server gets his NetworkManager from the Server scene and the client indeed from only the MainMenu.

Yes you're correct, I don't remove the lobby / players from lobby properly (yet). Sorry about that, will deff add it later, since i'm just basically playtesting the game with a couple of friends it's not that big of an issue yet.

After digging for days and getting quite hopeless about what this issue might be caused by. I am really happy to get your response hahaha, that sounds a bit desperate but just know it's big time appreciated.

Tonight i digged some more and started logging in some NGO / Transport classes in the PackageCache. It really seems that the moment the "only" / "first" player disconnects by closing the browser. Something prevents the server from accepting any new connection.

In case you have any questions really feel free to ask!

Kind regards, Kevin

PS: sorry i hit the wrong button when commenting, i accidentlly closed instead of just commented, now re-open :)

NoelStephensUnity commented 4 days ago

No worries... just wanted to make sure there wasn't something broken within the NGO SDK itself which seems to be the case so far. I will proceed to the dedicated server hosting side of things then.

One thing that I am going to check on... is how the session is being created... Spin up a game by starting the WebGL game -> main menu -> Lobby Browser -> Create game

It seems the 1st player is what gets the server instance to spin up...and so there could be something on the service side that considers the session ended if the owner who started the game completely disconnects (i.e. browser closed).

From an NGO side, there is nothing that I am seeing that would be causing this... so I will most likely need to get someone from the cloud services side to take a look and determine if this is just a settings thing or the like. (Might take a day or two to run through that process... will get back to you by early to mid next week)

kevinstriker commented 4 days ago

Thanks @NoelStephensUnity !

Yeah completely understandable!

The game is a casual first person shooter for web. Games last 10 minutes and once spun up, the server should stay active for 10 minutes (the match time).

Game flow:

  1. The player opens the game and clicks "lobby browser"
  2. The player can join an existing game or create a new game
  3. When creating a new game, I'm using Matchmaker (this based on the unity docs) to simply get a ticket and after request a Multiplay server to spun up.
  4. After having a successful allocation and the server has spun up, the player joins it. The Game server (Multiplay) spins up Lobby and Relay, and these work together. After client joined the just created Lobby / Relay, it will call the StartClient() method on the NetworkManager and it joins the game.
  5. The UI for in-game pause (by hitting "ESC" button in-game, so on the prototype map :) ), shows a "Main Menu" button
  6. Leaving the game by using the Main Menu button will work properly. The game server will keep accepting new connections
  7. Leaving the game (and you're the only player) with closing the browser, will cause the connection manager to be confused.

While this flow might not be the final flow. It is however very common (and preferred) for web shooter games that the Server can exists for 10 minutes without it becoming "unjoinable”.

People will indeed leave during these sessions quite often, hopping between games etc. Making this small bug have quite some impact.

PS: i did think about closing the game when 0 players are in the game. However this only solves the problem partly. Since the bug happens right away after player leaving the browser, before the time out disconnect happens; meaning even with the check, there is a window of unjoinable games.

kevinstriker commented 4 days ago

Sorry this is the correct order of scenes: Screenshot 2024-11-08 at 23 35 39

Last note that can be interesting: changing the HeartbeatTimeoutMs actually makes the bug disappear for the window the HeartbeatTimeoutMs is set.

Example: default is 500ms, and i join like after 2 seconds or so of closing the browser, the bug happens and I can't join. However, putting a strange high value like 20000, 20 seconds, and I close the browser (triggering normally the bug) and i join on another browser after 2 seconds or so, i can perfectly fine join.

This "workaround" did give other bugs down the line and is not a valid solution, but it might give you a good angle "where" to look.

kevinstriker commented 1 day ago

@NoelStephensUnity i'm the whole week working regular office hours and also available in the evenings for any questions. So FYI if i wrote something that is unclear (english is not my native language), really feel free to ask questions when you take a look somewhere this week! Hopefully we can track this bug down :)

NoelStephensUnity commented 9 hours ago

Leaving the game by using the Main Menu button will work properly. The game server will keep accepting new connections Leaving the game (and you're the only player) with closing the browser, will cause the connection manager to be confused.

Ahhh... so when the last client disconnects by just closing the browser (i.e. non-graceful disconnect) it causes the issue to happen. That helps... let me talk with some of the services folks to see if there is any known issues with that.

kevinstriker commented 6 hours ago

@NoelStephensUnity yes indeed!!! Yesss, non-graceful way was indeed the name i was looking for!!!

This way of closing the game happens quite often for web games.

I was thinking, I could just de-allocate the server when player count is 0 as a workaround. Problem however is that the bug happens right after closing the browser, not after timeout disconnect event. Meaning, me closing the browser as a last player on the server immediately makes the connectionmanager/server invalid for new connections. However the "last" player leaving is only kicked out 30 seconds later by timeout. During this time, the server won't accept new connections, causing new players to see the time out / experiencing the bug. So de-allocation on 0 player count is not really a solution, leaves a xx second window still for the bug to happen, besides being far from ideal to close the game server on 0 players.

NoelStephensUnity commented 3 hours ago

@kevinstriker Yeah, that is a general issue across the board for ungraceful disconnects... you don't really know if a client has disconnected until it times out... now you can tweak the UnityTransport's "Disconnect Timeout MS" value to something less like say 10-15 seconds, but with WebGL you could potentially run into issues where it really didn't timeout but is just taking that long... so you might play with that value.

However, it seems odd that a new player cannot join even if the server still thinks the last player is connected...

Just to check, does this happen if say you join with 1 client and then close the browser or does it require more clients to join?

Also, have you set the NetworkManager Log Level to Developer and if so do you have the dedicated server log file where this scenario happened? (Haven't had a chance to setup the dedicated server on my end yet so just looking to see if we have any additional information we can pull from in order to determine if this is service specific or NGO specific).

The only other thing I could think of would be to try reconnecting (after having ungracefully disconnected) in private browser mode to see if there are any cached values getting in the way of things...

Have raised this issue with the services group and once they have a chance to look over the issue and respond will let you know.