Unity-Technologies / com.unity.netcode.gameobjects

Netcode for GameObjects is a high-level netcode SDK that provides networking capabilities to GameObject/MonoBehaviour workflows within Unity and sits on top of underlying transport layer.
MIT License
2.12k stars 430 forks source link

NetworkShow() follow by Despawn livelocks the server #3023

Closed SamTheBay closed 2 weeks ago

SamTheBay commented 3 weeks ago

Description

If you call NetworkShow() on a NetworkObject and then Despawn(true) before the network tick runs it gets the server into a livelock state that requires it to be restarted. This is because the NetworkObject is added to a queue in the SpawnManager which will be executed on the next network tick. However, the NetworkObject is destroyed by the time the next network tick occurs. When the SpawnManager tries to show the NetworkObject it attempts to reads the transforms parent, which throws and exception since the NetworkObject has been destroyed already. The SpawnManager will repeat attempting to show the destroyed NetworkObject on every network tick from then on since it is never removed from the queue, thus livelocking the server.

Reproduce Steps

Code like this will reproduce it...

const int c_spreadFrames = 300;
if (testNetItem != null)
{
    if (testStage == 0)
    {
        testNetItem.NetworkObject.NetworkHide(testPlayer.OwnerClientId);
        testStage = 1;
    }
    else if (testStage < c_spreadFrames)
    {
        testStage++;
    }
    else if (testStage == c_spreadFrames)
    {
        testNetItem.NetworkShow(testPlayer.OwnerClientId);
        testNetItem.NetworkObject.Despawn(true);
        testStage = c_spreadFrames + 1;
    }
    else
    {
        testNetItem = null;
        testStage = 0;
    }
}

Note: You must spread out the NetworkHide from the NetworkShow by at least one network tick. Otherwise, it will not repro because there is logic to protect against trying to hide and show in the same frame.

Actual Outcome

Server gets into a stuck state in which no new objects are able to spawn for clients.

Expected Outcome

Server should discard the destroyed item from its spawn queue since it is no longer a valid action.

Environment

Additional Context

I seem to be able to work around the issue by creating my own class SafeNetworkBehaviour and keeping track of the last time a NetworkShow() call was made. Then, in the OnNetworkDespawn() method I check if NetworkShow was called recently. If it was, I go through and call NetworkHide on the recently shown clientIds. This will remove it from the SpawnManager queues before the NetworkObject is destroyed. However, this seems super hacky and it would be better if the networking code handled this by default.

LightPat commented 3 weeks ago

I had this exact same issue on 1.7.1, 1.8.1, 1.9.1, and 1.10.0. I just removed the entire networkshow/networkhide stuff from my codebase, however I would like to add it back eventually

NoelStephensUnity commented 2 weeks ago

@SamTheBay Yeah that does seem like a bug indeed...and we should have some form of protection against that. In order to better understand how you ended up with this bug, could you explain further why you would want to do this?

        testNetItem.NetworkShow(testPlayer.OwnerClientId);
        testNetItem.NetworkObject.Despawn(true);

My confusion with that section of code is:

Looking at the code base I think I can see how this could cause problems, but I am more interested in why you would want to do this/what is it you are trying to accomplish performing these specific actions in that specific order?

Also, could you provide the stack trace to the exception?

NoelStephensUnity commented 2 weeks ago

My first pass thought is something like:

        internal void HandleNetworkObjectShow()
        {
            // Handle NetworkObjects to show
            foreach (var client in ObjectsToShowToClient)
            {
                ulong clientId = client.Key;
                foreach (var networkObject in client.Value)
                {
                    if (networkObject != null && networkObject.IsSpawned)
                    {
                        try
                        {
                            SendSpawnCallForObject(clientId, networkObject);
                        }
                        catch
                        { }
                    }
                }
            }
            ObjectsToShowToClient.Clear();
        }

Which would:

This I could put into place pretty quickly... the test to validate will take way longer than it did to put some protection around that portion of the code. 😸

SamTheBay commented 2 weeks ago

Thanks for the follow up! In game I am not doing exactly what is shown, that is just code to create a simple repro. The game is an MMO style architecture, so the server is hosting many "levels" and the players can move between them freely. I use hide and show to only make the level the player is currently in visible to their client. The levels also have timers on them at which point they expire and are destroyed. So, there is a race condition where a player could choose to join a level right at the moment the timer expires, and the server decides to destroy it. The server logic will eventually detect this and send the player back to "town" but by that time it is too late because it will enter this death loop. I have added special protection against this by keeping track of when the last Show call was and then hiding the object on despawn which looks to be working as a mitigation (but is wasteful with regards to performance).

Yes, I agree your solution looks good. Here is the exception callstack...

NullReferenceException at (wrapper managed-to-native) UnityEngine.Component.get_transform(UnityEngine.Component) at Unity.Netcode.NetworkObject.GetMessageSceneObject (System.UInt64 targetClientId) [0x00082] in .\Library\PackageCache\com.unity.netcode.gameobjects@1.8.1\Runtime\Core\NetworkObject.cs:1765 at Unity.Netcode.NetworkSpawnManager.SendSpawnCallForObject (System.UInt64 clientId, Unity.Netcode.NetworkObject networkObject) [0x0000d] in .\Library\PackageCache\com.unity.netcode.gameobjects@1.8.1\Runtime\Spawning\NetworkSpawnManager.cs:764 at Unity.Netcode.NetworkSpawnManager.HandleNetworkObjectShow () [0x0003b] in .\Library\PackageCache\com.unity.netcode.gameobjects@1.8.1\Runtime\Spawning\NetworkSpawnManager.cs:1124 at Unity.Netcode.NetworkBehaviourUpdater.NetworkBehaviourUpdater_Tick () [0x00008] in .\Library\PackageCache\com.unity.netcode.gameobjects@1.8.1\Runtime\Core\NetworkBehaviourUpdater.cs:128 at (wrapper delegate-invoke) .invoke_void() at Unity.Netcode.NetworkTickSystem.UpdateTick (System.Double localTimeSec, System.Double serverTimeSec) [0x000b8] in .\Library\PackageCache\com.unity.netcode.gameobjects@1.8.1\Runtime\Timing\NetworkTickSystem.cs:102 at Unity.Netcode.NetworkTimeSystem.UpdateTime () [0x0006a] in .\Library\PackageCache\com.unity.netcode.gameobjects@1.8.1\Runtime\Timing\NetworkTimeSystem.cs:139 at Unity.Netcode.NetworkManager.NetworkUpdate (Unity.Netcode.NetworkUpdateStage updateStage) [0x0005e] in .\Library\PackageCache\com.unity.netcode.gameobjects@1.8.1\Runtime\Core\NetworkManager.cs:54 at Unity.Netcode.NetworkUpdateLoop.RunNetworkUpdateStage (Unity.Netcode.NetworkUpdateStage updateStage) [0x0002d] in .\Library\PackageCache\com.unity.netcode.gameobjects@1.8.1\Runtime\Core\NetworkUpdateLoop.cs:185 at Unity.Netcode.NetworkUpdateLoop+NetworkPreUpdate+<>c.b__0_0 () [0x00000] in .\Library\PackageCache\com.unity.netcode.gameobjects@1.8.1\Runtime\Core\NetworkUpdateLoop.cs:232

(Filename: ./Library/PackageCache/com.unity.netcode.gameobjects@1.8.1/Runtime/Core/NetworkObject.cs Line: 1765)

NoelStephensUnity commented 2 weeks ago

I use hide and show to only make the level the player is currently in visible to their client. The levels also have timers on them at which point they expire and are destroyed. So, there is a race condition where a player could choose to join a level right at the moment the timer expires, and the server decides to destroy it. The server logic will eventually detect this and send the player back to "town" but by that time it is too late because it will enter this death loop.

Ahh... I think I understand the scenario better. Based on the full call stack (thank you for that), it does look like the first pass fix should do the trick. I will get a PR up for this by tomorrow.

👍