Yellow-Dog-Man / Resonite-Issues

Issue repository for Resonite.
https://resonite.com
132 stars 2 forks source link

[2024.7.15.1359] dotnet8 headless crashes when a specific cone collider is spawned #2588

Closed ikanimew closed 1 month ago

ikanimew commented 1 month ago

Describe the bug?

I am running into an issue where if I run my headless under mono it works fine, but under dotnet8, as soon as a user joins the session, then interacts with a collider, the headless crashes with variations on the following error:

   at BepuPhysics.Collidables.ConvexHullHelper.ComputeHull(System.Span`1<System.Numerics.Vector3>, BepuUtilities.Memory.BufferPool, BepuPhysics.Collidables.HullData ByRef)
   at FrooxEngine.ConeCollider.CreateShape(FrooxEngine.PhysicsSimulation, Elements.Core.float3 ByRef, Single ByRef, System.Nullable`1<Single>, BepuPhysics.BodyInertia ByRef)
   at FrooxEngine.Collider`1[[BepuPhysics.Collidables.ConvexHull, BepuPhysics, Version=2.4.0.2, Culture=neutral, PublicKeyToken=9345ce38ee48a1cd]].RegisterShape(FrooxEngine.PhysicsSimulation, Single ByRef, System.Nullable`1<Single>, BepuPhysics.BodyInertia ByRef)
   at FrooxEngine.Collider.UpdateCollider()
   at FrooxEngine.UpdateManager.RunQueue[[System.__Canon, System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]](System.Collections.Generic.Queue`1<System.__Canon>, System.Action`1<System.__Canon>)
   at FrooxEngine.UpdateManager.RunChangeApplications()
   at FrooxEngine.World.RefreshStep()
   at FrooxEngine.World.Refresh()
   at FrooxEngine.WorldManager.UpdateStep()
   at FrooxEngine.WorldManager.RunUpdateLoop()
   at FrooxEngine.Engine.UpdateStep()
   at FrooxEngine.Engine.RunUpdateLoop()
   at FrooxEngine.StandaloneFrooxEngineRunner.UpdateLoop()
./launchresodotnet.sh: line 14: 34537 Aborted                 dotnet /home/void/reso-headless/Headless/net8.0/Resonite.dll

To Reproduce

headless install on Debian 12 system, running from an existing mono install. Using the same config file, launch from dotnet8 Join the hosted session. wait without moving around for the world to finish loading Interact with a collider headless will begin giving "Engine unresponsive" messages, then the fatal error.

Expected behavior

session should remain responsive with collider events

Screenshots

No response

Resonite Version Number

2024.7.15.1359

What Platforms does this occur on?

Linux

What headset if any do you use?

Vive Pro Eye

Log Files

DISPLACER - 2024.7.15.1359 - 2024-07-15 20_08_12.log (client log) resonite - 2024.7.15.1359 - 2024-07-15 20_25_26.log (headless log) stdout.log (headless stdout log)

Additional Context

I'm unsure if the issue is code, something with my world, or something with the install. I'm happy to swap those around to test that further but I'd like to see if the current logs can provide some insight first.

Reporters

Ikani

stiefeljackal commented 1 month ago

The original issue edited out the following information that is rather important:

Fatal error. System.AccessViolationException: Attempted to read or write protected memory.
This is often an indication that other memory is corrupt.

Source: https://discord.com/channels/1040316820650991766/1154514015721099294/1262617355905794100

I cannot find it in the logs since it is a fatal error, so I will include it in this comment.

This looks like an issue with Convex Hull, which is related to #1908 (not a duplicate) and a regression of #1198.

I also believe this is missing the prerelease label since this is the .NET 8 headless.

Frooxius commented 1 month ago

1) Does this happen with particular world? Or can you replicate it on a gridspace too? 2) If so, which world does this happen with?

What is odd here is that you state that this happens with collider event, but the log doesn't show that - instead it seems to be when a convex hull is being computed (which is not a collider event). So this might be specific to particular collider or collider being modified.

ikanimew commented 1 month ago

Okay, so I did a bunch of testing and was able to narrow it down to a cone collider on my avatar that would cause the crash. I could pretty reliably get the headless to start timing out, then crash if I spawned that avatar out. I then modified the avatar to remove that collider from it, and the crashes stopped. This was confirmed in both my regular world and a grid world, as well as with a fresh install of the headless (moving the install directory out of the way and re-downloading, so everything but the config file was fresh).

Here's the collider in question. 20240716234958_1

If desired, I can share a copy of that avatar with staff to examine. I believe this is the cause, though.

Frooxius commented 1 month ago

@ikanimew I see that the ConeCollider Radius is being driven. Do you know what range of values it goes through?

ikanimew commented 1 month ago

Doing a quick check, it looks like the driver is a ValueGradientDriver that goes from 0.04 to 0.015 and is linked to the state of a blendshape. The position and rotation drives are similar.

Frooxius commented 1 month ago

Hmm... I checked the code and it seems they are clamped. I couldn't replicate the issue even with nonsensical values (like NaN, Infinity, negative and so on).

Are you able to make a replication item that's able to reproduce this issue?

Xlinka commented 1 month ago

This issue only occurs on Linux headless with .NET 8. I have tested with @ikanimew, and no issues were observed on Windows build and release.. The crash happens with a specific cone collider on a stinger, which is part of an avatar's tail.

On the Linux headless server, as soon as a user joins the session, the server begins giving "Engine unresponsive" messages, followed by the fatal error: System.AccessViolationException: Attempted to read or write protected memory.

@ikanimew placed the stinger object on a cube, and spawning it in a Linux headless session consistently caused the crash. The cone collider's radius is driven by a ValueGradientDriver ranging from 0.04 to 0.015.

This issue is isolated to the Linux build of the .NET 8 headless. Sometimes, the issue can take up to a minute to occur. assuming this issue is related to the convex hull calculation.

shiftyscales commented 1 month ago

Is this issue exclusive to running the Linux headless client then, @ikanimew @Xlinka? Can you also not replicate it when running the headless under Windows?

If so, would you be able to cooperate with other users to test the Linux headless client with other systems and see if it still occurs there too?

Would you also mind providing the replication object onto this issue for ease of access in additional tests as Frooxius requested above?

Xlinka commented 1 month ago

I have tested loading Ikani on two of my headless servers running Windows 11 (10.0.22621.3880) Earlier today, and both had no issues with loading and did not crash. I'm currently asking @ikanimew for the replication object for additional tests awaiting a response.

The headless Linux that was crashed belongs to @ikanimew and was able to crash in their Hive world and a normal gridspace. to remove the suspect of a item in the world.

Ikani notes that "So I was able to reproduce by just making an empty slot, adding a cone collider, setting the height to 0.05 and driving the radius with a value gradient driver set to 0.04 for 0 and 0.015 for 1, and the progress at 0."

i shall carry a test out for this on my own linux headless.

shiftyscales commented 1 month ago

Interesting. Does it require those specific values in the ValueGradientDriver, or does the issue occur with any arbitrary values so long as the radius is being driven?

TeknoCatron commented 1 month ago

Xlinka attempted the test on one of TheRoxDen Debian 12 servers running Beta 2024.7.17.1173

Could not reproduce TRD-42400 - 2024.7.17.1173 - 2024-07-19 01_42_46.log

ikanimew commented 1 month ago

console.log Alright, so to rule out my existing Debian 12 host, I spun up a new Debian 12 VM, ran through a default OS install (no gui, standard packages, ssh server), then installed dotnet8 and steamcmd and launched a grid world. I was able to join the grid and saw no stability issues. I then created an Empty Slot, and added a cone collider to the slot. As soon as I did this, the console (like 936) started reporting "Engine unresponsive" for just over a minute. At the end of that (line 1002) the collider component appeared in the inspector. I gave it a moment, then set the collider to Trigger, and the Unresponsive messages continued until the crash message listed.

This log is the full console log, to show the install process as well. I loosely based my install steps from the Dockerfiles in https://github.com/voxelbonecloud/debian-dotnet and https://github.com/voxelbonecloud/headless-docker though this is NOT a docker based deployment.

And for clarification, I do not have the original collider on my avatar any longer, so I believe that's not what's triggering the hangs and crashes

ikanimew commented 1 month ago

To add to this, I spun up a new VM in DigitalOcean, followed the same steps as above, the the issue does NOT happen there. I'm going to do a bit more testing here but this just keeps getting weirder

shiftyscales commented 1 month ago

Thank you @ikanimew. Hopefully you are able to better isolate the source of the issue.

ikanimew commented 1 month ago

So, doing more tests, this is definitely limited to the specific hardware of my datacenter server. I've been unable to reproduce the error anywhere else, including on an identical server at home. Going to close this for now and look into hardware replacement.