expressobits / steam-multiplayer-peer

Steam Sockets Multiplayer Peer for Godot 4 via GDExtension
MIT License
150 stars 8 forks source link

Game crashes when calling peer.disconnect_peer #12

Open bugbountyguy opened 4 months ago

bugbountyguy commented 4 months ago

I originally posted this over in GodotSteam project, but the more I mess with this, the more I'm sure it's related to this instead of GodotSteam. If it does turn out to be that library instead, I'll close this - but all signs appear to be pointing to this one here.

When I get the lobby chat update, and the player is leaving the lobby (using GodotSteam library), I'm manually calling peer.disconnect_peer, as I want the player to be removed from the game altogether, immediately, rather than waiting for the RPC disconnect to happen after 5-10 seconds (just in case they crash / disconnect, and then join back immediately). Upon doing so, the game crashes completely, with the following inside the stacktrace (only found it when doing a build):

[2] /home/user/godot_builds/libsteam-multiplayer-peer.linux.template_debug.x86_64.so(+0xa9d1) [0x7aa53febb9d1] (??:0)
[3] /home/user/godot_builds/libsteam-multiplayer-peer.linux.template_debug.x86_64.so(+0xff61) [0x7aa53fec0f61] (??:0)

To reproduce, add peer.disconnect_peer somewhere in the game (e.g. kicking the player or something), the game will crash. Most likely something's not being handled properly when calling this method. I noticed some commented out code in this repo on that method, but I didn't quite understand the code, so that's what's leading me to post this bug report.

scriptsengineer commented 4 months ago

Hello bugbountyguy! Sorry delay!

You will check with my example code or bomberman (which is in the demo branch) and also with my personal project.

a question: what is your platform?

Could you provide a code to verify how you are using it? mainly after leave lobby?

bugbountyguy commented 4 months ago

No worries on the delay, totally understand :) To answer "what system" - this is Godot (vanilla, with godotsteam/steam-multiplayer-peer gdextensions) on Linux 64-bit (so all testing/etc. is done through Linux)

So just to provide a little more details so it's clear what the problem is and how I'm trying to solve it - and then the bug that seems to exist from trying to solve it:

The scenario: When a client is connected to a host in a game, and the client crashes or kills their game (ungracefully), there's an obvious disconnect there in the game. In my logs, I can clearly see a disconnect, as the log shows "Player has left the lobby". This comes from the method _on_lobby_chat_update, which is called by the signal: Steam.lobby_chat_update (from GodotSteam). However, the RPC for the player seems to persist for ~10 seconds or so. After that point, the player finally disconnects from the game (character vanishes).

The problem: If the player's game is terminated unexpectedly, and they re-join the lobby before their RPC session is dropped, there is a brief moment where there are "two" of the player in the game. When that 10 second timer hits, both instances are destroyed and the player's dropped from the lobby. The two issues here being: There's a duplicate of the player, and whether or not we find a way to clean that previous session up, the RPC from the previous connection is lost and causes the current connection to also drop.

The solution: As I was able to get the immediate feedback from the player leaving the lobby, I thought I would use that to call peer.disconnect_peer(id) on the player. This would call my _on_player_disconnected method, which is called by the multiplayer.peer_disconnected signal. From there, the player's unregistered and cleaned up from the scene/game properly. This would allow them to re-join the game immediately since the previous RPC connection was gracefully killed by the host, even though the player dropped out of the lobby from a game crash or something.

The bug: Calling the peer.disconnect_peer during that time frame immediately causes the host to crash, with a stack trace (part of the code snippet I posted above, only seen if you're using a build).

For a code snippet, basically it's all centered around these parts in my game (trimmed down for the sake of this issue):

func _ready() -> void:
    Steam.lobby_chat_update.connect(_on_lobby_chat_update)
    multiplayer.peer_disconnected.connect(_player_disconnected)
# ....
func _on_lobby_chat_update(_this_lobby_id: int, change_id: int, _making_change_id: int, chat_state: int) -> void:
    var changer_name: String = Steam.getFriendPersonaName(change_id)

    if chat_state == Steam.CHAT_MEMBER_STATE_CHANGE_LEFT or chat_state == Steam.CHAT_MEMBER_STATE_CHANGE_DISCONNECTED:
        Logger.info("%s has left the lobby." % changer_name, Logger.LogCategory.MULTIPLAYER)
        var client_rpc_id = _get_rpc_id(change_id)
        peer.disconnect_peer(client_rpc_id)
# ....
func _on_player_disconnected(id):
    GlobalSignals.player_left_signal.emit(id)
    players.erase(id)
    player_network_states.erase(id)

I hope this provides a clearer picture to this issue! I should have posted these details above, but I opened the issue at a time where my brain was already melting while trying to find hacky solutions around this lol.

scriptsengineer commented 4 months ago

Very clear now, thank for details! You will study the problem and the logo and I will give you some feedback where we can find it here

Previously I admit two things:

  1. This is always in windows, and we can find a bug with linux related to how the code behaves (mainly that various types behave differently in terms of size and default values)
  2. Try to disconnect and connect logo immediately.

Thank you for your attention!

scriptsengineer commented 4 months ago

@bugbountyguy I think I understand your error, it may be suspicious, but the list of connections in the code is cleaned on two occasions:

1 - In the connection close method (It is made in my game)

2 - When giving disconect_peer to a peer that exists in the peers (Here is possibly a problem, I don't clear it if I don't have a successful connection)

Anyway, I'm going to release update 0.0.6 today by placing some more explanatory warnings within the disconnect_peer method

scriptsengineer commented 4 months ago

@bugbountyguy trying add peer.close() avoid peer.disconnect_peer and report results

bugbountyguy commented 4 months ago

@bugbountyguy trying add peer.close() avoid peer.disconnect_peer and report results

I'm not home to verify this, but I think the problem with calling close() is that it would close the connection to all peers when the host calls this, right? If it was just a 2 player game, this would be okay, but for games that have 3 or more players, this would kill the game/connection for all players. The goal with doing a clean close with disconnect_peer allows to disconnect just one peer from the game and let the rest of the players and host continue playing. This is especially ideal for a player who might have crashed from the game.

That being said... I wonder if it's possible to have an array of all player peer connections? But I'm not sure, will have to investigate when I get home.

bugbountyguy commented 4 months ago

@scriptsengineer okay to update on my last post:

Yes, peer.close() closes all connections, so this would not be ideal. Here's the Godot Docs description:

Immediately close the multiplayer peer returning to the state CONNECTION_DISCONNECTED. Connected peers will be dropped without emitting peer_disconnected.

The method disconnect_peer is the method that's used to specify the specific client you want to disconnect from according to the docs:

Disconnects the given peer from this host. If force is true the peer_disconnected signal will not be emitted for this peer.

I think I read a post somewhere (can't remember if I saw it on reddit, discord, or what) that the regular Godot rpc MultiplayerPeer will disconnect connections like it's supposed to when using the disconnect_peer method.

Would it be possible to find out the flow made when a client 'crashes' (force kills the app) and then the rpc connection is lost after 10 seconds? Perhaps comparing that flow in this library against the flow of calling disconnect_peer directly will help point to the bug?

scriptsengineer commented 4 months ago

Yes, I'm still going to check this flow, one question, did you test disconnecting with p_force = true, the disconnect peer has a second parameter that forces the disconnection.

scriptsengineer commented 4 months ago

@bugbountyguy on godotsteam bug report you say

I did try the pre-compiled version to see what would happen and it does look like it still had the same issue. Really leaning towards some unhandled null pointer or something happening during the disconnect_peer. As that library uses GodotSteam, I'm wondering if it's not handling me doing a forced disconnect peer vs. a timeout (when a player crashes).

Confirming that you used godotsteam's precompiled SteamMultiplayerPeer and the problem is the same there?

bugbountyguy commented 4 months ago

Hi @scriptsengineer

Yes, I've tried with the force param as true (assuming you're talking about the 2nd optional parameter in disconnect_peer() )

I didn't try GodotSteam's precompiled MultiplayerPeer... I wasn't sure how to set that one up as there weren't instructions on it, and "SteamMultiplayerPeer" is not the class name they chose there. So I'm not using that one yet - right now it's just the gdextension of GodotSteam and steam-multiplayer-peer.

I've decided this morning to setup an example project for you: https://github.com/bugbountyguy/broken_multiplayer_peer_example/tree/main

You'll of course need a second steam connection (a friend or if you have a second steam account). There are instructions when you create a broken or working lobby in the game.

"Sorta" Working Lobby: Scenario 1: Host creates the lobby. Client joins the lobby. Client terminates their game (Ctrl+C, or F8 if you're in Godot). Host should see "Steam: Offline" show up. After ~10 seconds, the player is removed from the game.

Scenario 2: Host creates the lobby. Client joins the lobby. Client terminates their game (Ctrl+C, or F8 if you're in Godot). Host should see "Steam: Offline" show up. Player restarts the game and re-joins the lobby very quickly. Host and client will see a "duplicate" session of the same player. After < 10 seconds, the "new session" will be dropped and the player will be returned to the main menu. The old session that was terminated will remain in the game indefinitely.

Broken Lobby: Only Scenario: Host creates the lobby. Client joins the lobby. Client terminates their game (Ctrl+C, or F8 if you're in Godot). Game crashes on host's machine. Here's the line that crashes the game: https://github.com/bugbountyguy/broken_multiplayer_peer_example/blob/main/scripts/multiplayer_manager.gd#L165

316902108 commented 3 months ago

I also encountered the same problem; https://github.com/GodotSteam/GodotSteam/issues/444

ThunderClapD commented 1 week ago

Im getting the same issue on both of my laptops tried changing the renderer but that seems unrelated

disconnect_peer does work the other client disconnects but the host completely crashes

this error is only on file log the editor does not report this im on windows 11

Godot Engine v4.3.stable.official.77dcf97d8 - https://godotengine.org
Vulkan 1.3.278 - Forward+ - Using Device #0: NVIDIA - Quadro P2000

================================================================
CrashHandlerException: Program crashed with signal 11
Engine version: Godot Engine v4.3.stable.official (77dcf97d82cbfe4e4615475fa52ca03da645dbd8)
Dumping the backtrace. Please include this when reporting the bug to the project developer.
[1] error(-1): no debug info in PE/COFF executable
[2] error(-1): no debug info in PE/COFF executable
[3] error(-1): no debug info in PE/COFF executable
[4] error(-1): no debug info in PE/COFF executable
[5] error(-1): no debug info in PE/COFF executable
[6] error(-1): no debug info in PE/COFF executable
[7] error(-1): no debug info in PE/COFF executable
[8] error(-1): no debug info in PE/COFF executable
[9] error(-1): no debug info in PE/COFF executable
[10] error(-1): no debug info in PE/COFF executable
[11] error(-1): no debug info in PE/COFF executable
[12] error(-1): no debug info in PE/COFF executable
[13] error(-1): no debug info in PE/COFF executable
[14] error(-1): no debug info in PE/COFF executable
[15] error(-1): no debug info in PE/COFF executable
[16] error(-1): no debug info in PE/COFF executable
[17] error(-1): no debug info in PE/COFF executable
[18] error(-1): no debug info in PE/COFF executable
[19] error(-1): no debug info in PE/COFF executable
[20] error(-1): no debug info in PE/COFF executable
[21] error(-1): no debug info in PE/COFF executable
[22] error(-1): no debug info in PE/COFF executable
[23] error(-1): no debug info in PE/COFF executable
[24] error(-1): no debug info in PE/COFF executable
[25] error(-1): no debug info in PE/COFF executable
[26] error(-1): no debug info in PE/COFF executable
[27] error(-1): no debug info in PE/COFF executable
[28] error(-1): no debug info in PE/COFF executable
[29] error(-1): no debug info in PE/COFF executable
[30] error(-1): no debug info in PE/COFF executable
[31] error(-1): no debug info in PE/COFF executable
[32] error(-1): no debug info in PE/COFF executable
[33] error(-1): no debug info in PE/COFF executable
[34] error(-1): no debug info in PE/COFF executable
[35] error(-1): no debug info in PE/COFF executable
[36] error(-1): no debug info in PE/COFF executable
-- END OF BACKTRACE --
================================================================