forceMatrixRecalculationPerRender causes RenderTransformOverride to cause a Unity crash

Polly-Geist commented 11 months ago

Describe the bug?

Oddly can't find any ongoing issues related to a series of crashes others have been experiencing as well. Theories of it being related to render transform override scaling anything to 0,0,0 is thought to crash random users in a world even if they themselves do not use them.

I originally believed it was the fog volume in a session of mine but realised it couldn't be as I wasn't crashing until visitors arrived and were around for a while. I then crashed again after joining and being in someone else's session.

All I have specifically are the Unity Crash Logs as the crashes halt the Resonite logs mid-line. I can provide those logs as well if needed, but this appears to be an issue in the rendering pipeline's handling of something.

To Reproduce

Basically exist in any session with a user who has Render Transform Override forcing a slot to 0,0,0 scale.

Expected behavior

Ideally, it shouldn't randomly crash anyone.

Screenshots

No response

Resonite Version Number

2023.10.17.464

What Platforms does this occur on?

Windows

What headset if any do you use?

Index, Desktop

Log Files

Crash_2023-10-18_051740400.zip Crash_2023-10-18_060702704.zip

Additional Context

I couldn't find any other reports specifically about this and nobody directed me toward any, just the usual "put it on github".

If any additional info is needed, let me know.

Reporters

@PollyGeist on Discord (me)

Nytra commented 11 months ago

There is a discussion about RenderTransformOverride crashing here https://discord.com/channels/1040316820650991766/1162208558419558504

dfgHiatus commented 11 months ago

Copied from the Resonite Discord. There appears to be a missing null check AFAIK internally

Nytra commented 11 months ago

I think there should also be a null check in the Override() method

mlehmk commented 11 months ago

Instead of randomly adding null checks, it should rather be debugged, why attachedGameObject can be even null during the rendering context switch handler. The handler is being unregistered before Destroy is called. And only Destroy sets the attachedGameObject to null. The Initialize method ensures that an attachedGameObject exists, if it was null after Initialize, that'd be the bug to chase. I wonder if it is possible to have ApplyChanges being called even before Initialize was called.

ticpu commented 10 months ago

I've been living this situation yesterday, same kind of crashes as explained here. People not even aware of the Discord thread about RTO and this of issue were also mentioning that RTO could be the problem. I directed them here but don't know what they could do to help get more exposure on this issue.

Maybe the issue title should mention RenderTransformOverride somewhere to bring more attention to it since the generic crash title might not be very interesting to open for the devs.

I think it should be mentioned that Shrike who linked another issue here also mention that this happens when scale is 0.1, scale 0 doesn't seem to be the problem here as I don't get people crashing around me and I am using scale 0 since day 2 of Resonite. I only see people crashing like this in bigger worlds and it doesn't start with me joining, it has already been ongoing.

Here's the stack trace when this happen, showing a more generic Direct3D error and noting a script error happening seconds before.

Script error: OnPreCull
========== OUTPUTTING STACK TRACE ==================
0x00007FFB13498BF9 (d3d11) CreateDirect3D11SurfaceFromDXGISurface
  ERROR: SymGetSymFromAddr64, GetLastError: 'Attempt to access invalid address.' (Address: 00007FFACC27C368)
0x00007FFACC27C368 (UnityPlayer) (function-name not available)
  ERROR: SymGetSymFromAddr64, GetLastError: 'Attempt to access invalid address.' (Address: 00007FFACC280524)
0x00007FFACC280524 (UnityPlayer) (function-name not available)
  ERROR: SymGetSymFromAddr64, GetLastError: 'Attempt to access invalid address.' (Address: 00007FFACC33A410)
0x00007FFACC33A410 (UnityPlayer) (function-name not available)
  ERROR: SymGetSymFromAddr64, GetLastError: 'Attempt to access invalid address.' (Address: 00007FFACC33A9BE)
0x00007FFACC33A9BE (UnityPlayer) (function-name not available)
  ERROR: SymGetSymFromAddr64, GetLastError: 'Attempt to access invalid address.' (Address: 00007FFACC33BE9D)
0x00007FFACC33BE9D (UnityPlayer) (function-name not available)
  ERROR: SymGetSymFromAddr64, GetLastError: 'Attempt to access invalid address.' (Address: 00007FFACC33BF78)
0x00007FFACC33BF78 (UnityPlayer) (function-name not available)
0x00007FFACC663988 (UnityPlayer) UnityMain
0x00007FFB17C67034 (KERNEL32) BaseThreadInitThunk
0x00007FFB1989CEC1 (ntdll) RtlUserThreadStart

hazre commented 10 months ago

I understand that this is a vague name for a issue, but tbh we have zero clue what it's causing it. It's the best we have.

These crashes are really bad and I can easily say it's been happening to like 10 other people I know. I hope the Team looks at this soon, because it makes the game unplayable when it starts happening.

Here's 3 of crashes that happened back to back last night: crashes.zip

Geenz commented 10 months ago

If you can provide some good example content of where you see crashes the most, that would help substantially on this.

hazre commented 10 months ago

If you can provide some good example content of where you see crashes the most, that would help substantially on this.

I'm pretty sure this very agnostic, meaning it's not some specific content.

Thing is that it's very inconsistent like others have said, like it can happen now or maybe in 2 hours. it's a lot easier to reproduce when you a lot of people around since you're increasing the chances of happening.

I guess easiest way to reproduce is to have to have bunch of people around in a world and just have one person have the RTO component on their avatar and pray for a crash.

Geenz commented 10 months ago

Looking at some of the logs, it's entirely possible a bad texture could be a culprit for some people. So I'm not entirely convinced that it's RTOs. Additionally- the specific code that handles our connection with Unity shouldn't be resulting in the game object becoming null seemingly at random like this. I'm not ruling this out however, just requesting content that has a higher likelihood of triggering this so we can dig deeper.

hazre commented 10 months ago

The error seems to happen at Script error: OnPreCull and looking at the code, that method name exists in a lot of places but the most relevant one seems to be this one in HeadOutput class

Geenz commented 10 months ago

Lets focus on getting to a more reproducible state rather than debugging snippets of code on the issue tracker.

Enverex commented 10 months ago

Crashed 5 times last night in the space of an hour. Looks like it was 0x00007FFD3B734AA9 (d3d11) CreateDirect3D11SurfaceFromDXGISurface each time. At the time everyone present claimed they were not using the RTO component (I wasn't either).

error.log error.log error.log error.log error.log

The world has been solid normally, so it being something on someone's avatar seems most likely.

ohzee00 commented 10 months ago

I just recently had a session that was having multiple unity crashes, I wasn't able to test all too much due to it being a social setting and the user in question I wasn't too familiar with.

We had a series of four users crashing very close together timewise, with some crashing multiple times. On asking around we found one user who was using RenderTransformOverride, and we asked them to change their avatar to something else. After that, we did not have anymore Unity crashes.

Wanting to test it a bit more, we wondered if the same avatar would crash users if RTO was set to 0.001, after doing such a we had two users unity crash within minutes.

Posted below is a picture of the RTO (before we modified it during testing)

and the player.log I had when crashing, showing the same error as users above, this was before we modified it:

Player-prev (5).log

Given the user was someone I didn't know and their avatar was custom, I did not want to trouble them further with requests. (Plus I wanted to just hang with friends and not turn it into a debug session)

Geenz commented 10 months ago

Cross posting one item from Discord as a potential repro: resdb:///3ddaada6d1c6bc2fe4e907ce5339c1474e6e383ea0374e92d5dc5bed23c9c4b4.brson

Also as a general reminder, please post all logs, not just the crash logs and dumps. This helps us better find holes in our logging, and can lead to uncovering consistent symptoms.

shiftyscales commented 10 months ago

As Geenz had indicated above- please include all of the crash logs, including the Resonite logs from the Logs folder where Resonite is installed, the Player.log and Crash.log if available from the crashes folders %TEMP%\Yellow Dog Man Studios\Resonite\Crashes.

Additionally there seem to be a wide number of variables to this issue, and it would help immensely if some replication conditions/commonalities could be established across all of the cases observed:

Was the crash experienced on a headless host or a client host?
Was the crashing user in desktop only mode, or VR? If VR, were they in desktop or VR at the time of crashing?
Do the affected users have any hardware in common, e.g. GPU vendor (AMD/Nvidia), same GPU, etc.
If RTO is involved at all- there must be some common replication condition, e.g. the context it is set to, what values are used for the overrides, etc. A replication condition ideally needs to be found
The default platform head and hands avatar also makes use of RTO, have users crashed when a user without a favourited avatar loads in? Otherwise, it seems likely it either is not exclusively that component causing issues, or as mentioned above- the component is used in a different configuration

The more commonality that can be identified between all known cases, and the simpler the replication conditions can be made- the easier this will be for us to solve.

shiftyscales commented 10 months ago

GPU driver version could also potentially be a relevant factor to identify/nail down. E.g. for Nvidia users, are all affected users using GeForce Experience/updated to the latest driver? If so, is there an earlier driver version where the issue does not occur?

Nytra commented 10 months ago

Crash experienced after spawning the repro item a bunch of times (resdb:///3ddaada6d1c6bc2fe4e907ce5339c1474e6e383ea0374e92d5dc5bed23c9c4b4.brson) and then spawning the default avatar from Resonite Essentials a bunch of times, and doing other stuff like destroying the avatars, undo destroy, equipping the avatars, respawning... Not sure exactly how to reproduce the crash.

Happened in my local home with nobody else in there. Launched with SteamVR but was in desktop mode. Nvidia GTX 1080Ti (537.13 driver). Geforce experience 3.27.0.112. DESKTOP-H976HO2 - 2023.10.20.831 - 2023-10-23 19_44_20.log error.log Player.log

(Editing to also include the dmp file) crash.dmp

Intel i7 8700k, 32GB RAM, Windows 11

Nytra commented 10 months ago

It happened again and I was just doing the same thing. Spawned both types of avatars a bunch of times and then started destroying them and undo destroy. In my local home with nobody else in there. crash.dmp DESKTOP-H976HO2 - 2023.10.20.831 - 2023-10-23 20_32_59.log error.log Player.log

ShrikeAlvaron commented 10 months ago

So I actually just managed to reproduce this using the same steps as Nytra, also mixing in my own Avalis. At first it didn't seem to do anything, but once I started equipping the avatars and jumping around between them the Unity crash came relatively fast after that. Maybe equipping an RTO quickens the crash, or is a root cause? Oh, also this was in my cloud home, which is mostly still the default though I do have a video player in it playing a Youtube stream.

This was in SteamVR mode but in desktop via hotswitching, AMD 5800X, 32GB RAM, and Nvidia RTX 3080 with drivers 545.84. Unfortunately I forgot to launch without mods, but here's my logs nonetheless. I might try again being sure to disable mods later.

BLACKLIGHT - 2023.10.20.831 - 2023-10-23 14_45_40.log crash.dmp error.log Player.log

Nytra commented 10 months ago

I got it to happen again this time launching in screen mode without SteamVR running. Again just spawning the default avatar and repro avatar a bunch of times and destroying, undo destroy and equipping them. crash.dmp DESKTOP-H976HO2 - 2023.10.20.831 - 2023-10-23 21_03_09.log error.log Player.log

Nytra commented 10 months ago

I managed to record a video with OBS of a Unity crash happening. This one happened very quickly which I was surprised about. This time I was just duplicating the avatars. https://youtu.be/z6VgBFhnCzQ?si=dNZdmKfGxFg5KRKM

crash.dmp DESKTOP-H976HO2 - 2023.10.20.831 - 2023-10-23 21_17_27.log error.log Player.log

shiftyscales commented 10 months ago

I seem to have replicated it under the same conditions:

Paste resdb:///3ddaada6d1c6bc2fe4e907ce5339c1474e6e383ea0374e92d5dc5bed23c9c4b4.brson into the world
Grab and duplicate it a number of times (I had to do so substantially more-so than you did in the video, however)

I could seemingly not replicate it with the platform head and hands avatar, suggesting some fundamental difference between the two. I will try isolating the issue further, e.g. applying the RTO to a cube and performing the same test.

Based on your most recent log file, and the log file I produced though, I am not confident that RTO is the smoking gun.

SHIFTY-LAPTOP - 2023.10.20.831 - 2023-10-23 13_44_25.log error.log Player.log

shiftyscales commented 10 months ago

As a sanity check, I tried removing the RTO, and duplicated consisderably more copies of that avatar without issue. I will now try the reverse, and create a box with just the RTO, followed by a simplified skinned mesh/avatar (no ProtoFlux) to also test against.

shiftyscales commented 10 months ago

Applied RTO to a box on a skinned mesh renderer, and seemingly haven't been able to replicate the issue under the same condition. So results are inconclusive. But the use of the linked replication object resdb:///3ddaada6d1c6bc2fe4e907ce5339c1474e6e383ea0374e92d5dc5bed23c9c4b4.brson does seem to fairly reliably cause the crash to occur.

Could you try to simplify the replication object by stripping off components, Protoflux, etc. down to the absolute bare essentials and see if it still producces the issue, @Nytra?

shiftyscales commented 10 months ago

Remove as much from the object as you can while it is still able to replicate the issue- then if possible, try to create a new object that meets those same conditions to see if it can be simplified further still.

Nytra commented 10 months ago

I removed every slot and component on the avatar except for the RTO and the skinned mesh renderers connected to it. I then tried duplicating but was unable to make the crash happen. I will try some more things like maybe keeping the IK and avatar components intact.

shiftyscales commented 10 months ago

Thank you for testing. If we can get the replication case isolated/narrowed down, it'd definitely help a lot- but just having that relatively consistent replication case already helps a ton. :)

Nytra commented 10 months ago

Removing the EyeManager component from the replication avatar seems to stop the crash from happening. So maybe the RTO and EyeManager components are together causing the crash?

ohzee00 commented 10 months ago

Applied RTO to a box on a skinned mesh renderer, and seemingly haven't been able to replicate the issue under the same condition. So results are inconclusive. But the use of the linked replication object resdb:///3ddaada6d1c6bc2fe4e907ce5339c1474e6e383ea0374e92d5dc5bed23c9c4b4.brson does seem to fairly reliably cause the crash to occur.

Could you try to simplify the replication object by stripping off components, Protoflux, etc. down to the absolute bare essentials and see if it still producces the issue, @Nytra?

Getting a almost consistent crash with using this asset and duplicating roughly 30 times, I'll see if narrowing it down on my end helps at all.

(Stripping down protoflux without my own avatar in the world)

ohzee00 commented 10 months ago

Removing the EyeManager component from the replication avatar seems to stop the crash from happening. So maybe the RTO and EyeManager components are together causing the crash?

I'm seeing something similar, I did my replication case of duplicating the stripped down avatar 30 times without the eye manager and saw no crash. When respawning the avatar and doing it normally right after, I got a unity crash

ohzee00 commented 10 months ago

@Nytra Can you try confirming on your side if removing the Blink and Eye rotation drivers on the Eye manager stops the crashing from happening? I wonder if its the bone rotation or the blink itself causing it

Nytra commented 10 months ago

@ohzee00 removing those drivers does seem to stop the crash from happening.

Nytra commented 10 months ago

2023-10-23 22 07 51

Removing the OpenCloseTarget drive alone seems to stop the crashing. So the problem is likely related to blendshapes on SkinnedMeshRenderers.

ohzee00 commented 10 months ago

Removing the OpenCloseTarget drive alone seems to stop the crashing. So the problem is likely related to blendshapes on SkinnedMeshRenderers.

Can confirm this too with my replication case of duplicating the avatar, removing the pivots on the bones does not affect things. Only removing the blendshape affects it.

Going to do a log dump and video of this incase that helps in replicating what I'm doing exactly.

ohzee00 commented 10 months ago

My replication case is as follows:

Using a stripped down version (literally removing the protoflux and that's it) of the submitted model above I can cause a crash semi-reliably within a minute.

Load into a grid space
Spawn out the mentioned model roughly 30 times
Try walking back to spawn (Don't know if spinning or such helps with the crashes but I do it anyways)
You will most likely crash within a minute

DESKTOP-V75BHJO - 2023.10.20.831 - 2023-10-23 18_14_44.log

Player.log

Video of removing the blendshape, duplicating the avatar then doing a test without removing the blendshape and crashing shortly after:

https://streamable.com/klpkf3

shiftyscales commented 10 months ago

"You will most likely crash within a minute"

Is this issue time-based, @ohzee00 @Nytra? Can it be replicated with just one copy of the avatar in a world, and enough time passed?

If the blendshape is driven by means other than the eye manager, e.g. if it's driven/continually being changed does the issue still occur as well?

Nytra commented 10 months ago

I tried driving that blendshape directly with the Random Float ProtoFlux node and then duplicated the avatar again but the crash didn't happen.

ohzee00 commented 10 months ago

I currently have a test going with the avatar's blink being ping ponged 50x the speed. I'll tell you how that goes as I just leave it.

Before this test I did have a crash happen with pannering the blendshape but I'm having issues replicating as fast as the duplicate case

https://github.com/Yellow-Dog-Man/Resonite-Issues/assets/33611545/4454fc0a-0ca6-473f-a00d-8ab4e9693f5c

Though the time seems to be random, sometimes as soon as I'm done duplicating the avatars, it crashes, or other times I have to wait a minute to crash.

shiftyscales commented 10 months ago

As a sanity check, could you monitor your resource usage, e.g. memory on your system to check that you aren't running out of memory/crashing because of that, @ohzee00?

ohzee00 commented 10 months ago

As a sanity check, could you monitor your resource usage, e.g. memory on your system to check that you aren't running out of memory/crashing because of that, @ohzee00?

Doin' all fine when having a crash, barely goes over 2gb in ram, vram is fine and GPU is doing its best.

shiftyscales commented 10 months ago

Thank you.

ohzee00 commented 10 months ago

So I did more testing, doing the same replication case above it seems it does not matter the scale or literally any transform on the RTO component.

Position override causes the crash, rotation causes the crash and scale, even at .999 scale still does it. I even tried it at a scale of 1 on all axises and it still does it!

However, when removing the Mesh_face skinned mesh renderer reference in the RTO component (the mesh that has the blink), I'm not getting any crashes. At least, in my testing in the course of roughly 30 minutes.

My brain is kinda going numb at doing constant restarts at this point so unless someone has more exact testing I'm probably going to stop here for the night.

I really hope this is enough to pin it down what it could be.

ohzee00 commented 10 months ago

Did a bit more testing and thanks to Shrike, I was able to find the base asset.

resrec:///U-ohzee/R-a54d6f53-cc5f-4c53-9516-f57f78b0700f

This is a very simple world with the avatar above imported raw, removed its VRIK and has only the blink being driven externally by a eye manager, with of course the RTO still being on the head itself. I was still able to get a crash with just this when duplicating the avatar 30 times.

Frooxius commented 10 months ago

I have managed to narrow this down with the above. This seems to be a bug with Unity specifically unfortunately.

This seems to occur when forceMatrixRecalculationPerRender on a SkinnedMeshRenderer is set to true, which RenderTrasnformOverride will enable.

I have made a temp change to the code, where this would be set to true for every single SkinnedMeshRenderer. I have then duplicated the skinned mesh renderers without the RenderTransformOverride and still got a crash.

Conversely clearning the skinned mesh renderer in the list on the component and duplicating does not result in a crash.

This makes things a bit more difficult unfortunately, since it's a bit out of our control. The first thing to try would be to update Unity version to a newer one, since we're currently on an older 19f, but there were some issues to resolve with the new version.

Nytra commented 10 months ago

So it is not specifically related to the eye blinking blendshape being driven by the EyeLinearDriver? It seemed as though removing that field drive prevented the crashing, even though presumably that SkinnedMeshRenderer still had forceMatrixRecalculationPerRender enabled.

hazre commented 10 months ago

Can there be a temporary solution for this until the unity upgrade happens? like completely disabling the component's functionality in the next build and enabling it once everything is sorted? that would fix the crashing.

Frooxius commented 10 months ago

@Nytra Yes, it's unrelated. Disabling forceMatrixRecalculationPerRender and still duplicating the mesh failed to produce a crash.

@hazre Not without disabling or crippling existing functionality with the RenderTrasnformOverride and breaking some other things. However I do want to add RenderMaterialOverride that might provide alternate solution in a lot of cases.

shiftyscales commented 9 months ago

This will hopefully be resolved by #585- but for now, this issue is blocked.

Frooxius commented 8 months ago

There should be an improvement in 2024.1.12.1336. For RenderTransformOverride, the meshes won't get flagged with the transform doesn't actually get overridden on user's end. This should reduce the number of meshes that are actually flagged like this and thus improve stability.

It's still not a fix, because the underlying issue is there, but it should help hopefully.

Yellow-Dog-Man / Resonite-Issues