Placeholder-Software / Dissonance

Unity Voice Chat Asset
69 stars 5 forks source link

[help] Invalid memory access when playing networked Bolt game w/ Dissonance #203

Closed jesse-scam closed 3 years ago

jesse-scam commented 3 years ago

Hello!

We are in the final week before releasing Snapshot VR on Viveport. 30 days later we'll release on Steam. https://store.steampowered.com/app/1133580/Snapshot_VR/

We are supporting peer-hosted rooms of 10 players on Photon Bolt. 2 weeks ago we implemented the Dissonance w/ Bolt integration and early results were fantastic.

I'm happy to send over a copy of the repo as needed, but I'll link a build and rough STR here to get us started. Dissonance setup is an empty top-level gameobject with:

Here, when you join a match for the first time or in between matches you are moved to the SnapshotGlobal room. When you are on a team and a game starts you join team-specific rooms SnapshotPink and SnapshotBlue.

We first noticed issues the first night of testing a version that included Dissonance. It happened fairly randomly. After 5-10 players had joined, while many were talking and firing shots (Bolt events), the game would suddenly crash for the Server player, kicking all other players out of the room. It is an interesting point that the only client that actually has a hard crash is the Server player hosting the room. Since disabling the gameobject containing Dissonance scripts, we haven't seen any crash reports.

Reproducing solo is a bit tricky. I can reliably crash, but the STR isn't consistent. Essentially, I do the following:

  1. Load the build
  2. Create a room
  3. Start another client
  4. Join room with new client
  5. In between rounds, repeat steps 3-4
  6. Continuously talk and occasionally shoot the gun with the newest client to send events over Bolt

Eventually, for me, it crashes. Typically between 5-8 clients loaded simultaneously. At some point of my test cycle it'll crash.

My system info export, a build where I've reproduced the crash, 3 crash reports from a previous build, and a crash report from the current build can be found here: https://drive.google.com/drive/folders/1FUnsSazqrSDrR_MmOhNGC5KEtgP9c2_L?usp=sharing

Unity v2019.1.0f1 Personal Bolt Free Debug v1.2.11b Dissonance v7.0.2 based on last entry in changelog

My exact STR for this new build:

  1. Load the build (if no VR headset is connected, it'll load in a mouse and keyboard mode where you can have multiple instances running)
  2. Create a room
  3. Start another client
  4. Join room with new client
  5. Repeat steps 3-4 until I had 8 players in the room
  6. On the 8th player, talk and shoot the gun a few times (gun is on your hip, can use the janky VRSimulator to pick it up by pressing alt to enable hand movement, and using control to move the hand down to the hip. Click to grab it, right click to shoot)

Voice slowed and lagged, clients dropped out back to the lobby, server client crashed with the dmp file I added to the folder.

I am available any time this week to pair on this or give access to source, builds, etc... Let's figure this thing out and we can launch with Dissonance in-game voice. Thanks!

jesse-scam commented 3 years ago

If you do happen to repro with the build, the advice we've been giving to our players on how to locate relevant files:

If your game crashes - here is what you can do to help us fix it:

  1. Do not restart Snapshot immediately. We need some files from your computer before they get overwritten.
  2. Navigate to C:\Users\<Your User Name>\AppData\LocalLow\Giant Scam Industries\SnapshotVR. Here, copy Player.log and Player-prev.log to a new folder. These are game logs that might give us clues.
  3. Next, navigate to C:\Users\<Your User Name>\AppData\Local\CrashDumps, if it exists. Here, look for any .dmp file starting with SnapshotVR.exe. Definitely send us the most recent one, but sending us all of them doesn't hurt. Copy these to the same folder as the log files.
  4. Next, navigate to where Snapshot is installed. If using Steam, that should be somewhere like C:\Program Files (x86)\Steam\steamapps\common\Snapshot VR. Copy any file ending with .pdb, but most importantly UnityPlayer_Win64_mono_x64.pdb to the same folder as the other files.
  5. Send this folder to Tanner and I either via Discord or to bugs@giantscam.com
martindevans commented 3 years ago

I had a look at this briefly over the weekend and it appears to be an issue within Bolt. The crash happens here:

^^^ Unity player logging
0x000002504EDCC51E (Mono JIT Code) (wrapper managed-to-native) UnityEngine.DebugLogHandler:Internal_Log (UnityEngine.LogType,UnityEngine.LogOption,string,UnityEngine.Object)
0x000002504EDCC47B (Mono JIT Code) UnityEngine.DebugLogHandler:LogFormat (UnityEngine.LogType,UnityEngine.Object,string,object[])
0x000002504EDCC250 (Mono JIT Code) UnityEngine.Logger:Log (UnityEngine.LogType,object)
0x000002504F2B4168 (Mono JIT Code) UnityEngine.Debug:Log (object)
0x00000250514FF903 (Mono JIT Code) BoltLog/UnityWriter:BoltLog.IWriter.Info (string)
0x00000250514D13D8 (Mono JIT Code) BoltLog:Info (string)
0x00000250514F5E0B (Mono JIT Code) BoltInternal.BoltCore:UdpLogWriter (uint,string)
0x00000250514F1D68 (Mono JIT Code) UdpKit.UdpLog:Write (uint,string)
0x00000250514FF2CB (Mono JIT Code) UdpKit.UdpLog:Info (string,object[])
0x0000025066872D93 (Mono JIT Code) UdpKit.UdpConnection:OnStreamQueue (UdpKit.UdpStreamChannel,UdpKit.UdpStreamOp)
0x00000250668728C3 (Mono JIT Code) UdpKit.UdpSocket:OnEventStreamQueue (UdpKit.UdpEvent)
0x00000250515234EB (Mono JIT Code) UdpKit.UdpSocket:ProcessInternalEvents ()
0x00000250514F60B3 (Mono JIT Code) UdpKit.UdpSocket:NetworkLoop ()
0x00000250514F1E7A (Mono JIT Code) UdpKit.Async.Task:Runner ()
vvv Runtime stuff

I shared a copy of the slightly older Dissonance+BOLT integration (which sent the packets in a different way) to see if that avoids triggering the bug. @jesse-scam did you have any luck with this alternative?

jesse-scam commented 3 years ago

@martindevans Will try the stream implementation tonight. I also emailed this report to Ramon at Bolt and he should be following along. Will report back results tonight.

jesse-scam commented 3 years ago

Hey all. Hit a similar crash with the stream implementation. I uploaded a new zip SnapshotVR_StreamBuild... with the exe, pdbs, dmp, and log file.

Looked very similar.

I'll see if Ramon can join us to pair here. I'm happy to give each of you access to the Snapshot github repo if it's helpful. Also curious to hear if you can repro the crash locally.

ramonmelo commented 3 years ago

Hello @jesse-scam @martindevans ,

I'm Ramon Melo, from the Photon Bolt Team.

I will take a look at the files that Jesse sent. Maybe I can spot something on our side that too.

jesse-scam commented 3 years ago

Hey all,

I tried a different repro - having the Server player in Unity Editor and all remote players in builds. At around the same point we get bad behavior but no hard crash. What happens here is we get one or more Bolt ConnectionReset logs as new remote client joins the room:

7/25/2020 1:43:41 PM: 10054: ConnectionReset
UnityEngine.Debug:LogError(Object)
UnityWriter:BoltLog.IWriter.Error(String)
BoltLog:Error(String)
BoltInternal.BoltCore:UdpLogWriter(UInt32, String)
UdpKit.UdpLog:Write(UInt32, String)
UdpKit.UdpLog:Error(String, Object[])
UdpKit.Platform.DotNetSocket:HandleSocketException(SocketException)
UdpKit.Platform.DotNetSocket:RecvFrom(Byte[], Int32, UdpEndPoint&)
UdpKit.Platform.Photon.Puncher.PunchSocket:RecvFrom(Byte[], Int32, UdpEndPoint&)
UdpKit.Platform.UdpPlatformSocket:RecvFrom(Byte[], UdpEndPoint&)
UdpKit.UdpSocket:RecvNetworkData()
UdpKit.UdpSocket:NetworkLoop()
UdpKit.Async.Task:Runner()
System.Threading.ThreadHelper:ThreadStart()

That client then fails to initialize in SceneLoadRemoteDone in an odd way:

NullReferenceException: Object reference not set to an instance of an object
ServerCallbacks.SceneLoadRemoteDone (BoltConnection connection) (at Assets/Lib/Internal/Network/ServerCallbacks.cs:49)
BoltInternal.GlobalEventListenerBase.SceneLoadRemoteDoneInvoke (BoltConnection connection) (at <700d983e76cb4a2c8382c87b066b3956>:0)
UnityEngine.Debug:LogException(Exception)

It's essentially throwing a null reference on a static Dictionary after I check that the Dictionary contains the key. But when trying to access the index it's not happy.

        if (MatchServer.connectTokenCache == null) {
            Debug.LogError("Token cache reset to null.  Will throw exception in SceneLoadRemoteDone");
        }
        if (!MatchServer.connectTokenCache.ContainsKey((int)connection.ConnectionId)) {
            Debug.LogErrorFormat("Token cache doesn't contain key {0} on SceneLoadRemoteDone.  Will throw exception.", (int)connection.ConnectionId);
        }
        // EXCEPTION HERE
        Debug.LogError(MatchServer.connectTokenCache[(int)connection.ConnectionId].ToString());

        BoltEntity newPlayer = BoltNetwork.Instantiate(BoltPrefabs.PlayerPackage, token);

We don't error out in either of the first 2 checks, but on actual access of the key, we throw a null reference exception. Will continue investigating here.

I'm not sure this is causing the hard crash when running the Server player in a build, but it's possible. There are many ConnectionReset logs in the various crash log files I've included in the Drive folder.

Maybe if I add some protection on this static Dictionary AND we figure out the cause of ConnectionReset we'll be on to something? Would be interested to hear your thoughts.

Thanks guys!

jesse-scam commented 3 years ago

It does seem that the network starts really struggling right before the ConnectionReset messages. In-game audio and voice gets extremely delayed. It starts taking longer for players to connect.

Some interesting logs I've seen:

SynchronizerSampleSource: Playback desync (-2594ms) AHEAD beyond recoverable threshold

or similar many many times.

Same with a Bolt Envelope for connection N full logs.

martindevans commented 3 years ago

The synchronizer error probably indicates that there's a problem with the rate that packets are being delivered (e.g. a very large amount of packets getting buffered up and delivered all at once, instead of smoothly delivered over a period of time).

It sounds like bolt may simply be suffering an overload in the number of packets it's trying to send/receive. Could you perhaps try changing Dissonance to large frames, that packs more audio into a single packet so it may at least delay the problem.

jesse-scam commented 3 years ago

Still getting a crash with large frames. I completely removed the static Dictionary I mentioned to be sure that wasn't it. No luck yet.

martindevans commented 3 years ago

Hi @jesse-scam and @ramonmelo, did you ever manage to track this down to anything more specific?

ramonmelo commented 3 years ago

Hello @martindevans ,

I executed some tests using the Bolt integration with the sample project and it's working just fine, so it's not clear what can be happening on the other project.

jesse-scam commented 3 years ago

Yea unfortunately I seem to be unlucky. Could be the versions of Unity and Bolt that I'm using, or something particular to what's happening in my game over the Bolt network. We've swapped to another voice integration for now.

jesse-scam commented 3 years ago

I really appreciate everyone helping debug!

martindevans commented 3 years ago

I'm sorry it didn't work out for you with Dissonance :(