Open Polly-Geist opened 11 months ago
There is a discussion about RenderTransformOverride crashing here https://discord.com/channels/1040316820650991766/1162208558419558504
Copied from the Resonite Discord. There appears to be a missing null check AFAIK internally
I think there should also be a null check in the Override() method
Instead of randomly adding null checks, it should rather be debugged, why attachedGameObject can be even null during the rendering context switch handler. The handler is being unregistered before Destroy is called. And only Destroy sets the attachedGameObject to null. The Initialize method ensures that an attachedGameObject exists, if it was null after Initialize, that'd be the bug to chase. I wonder if it is possible to have ApplyChanges being called even before Initialize was called.
I've been living this situation yesterday, same kind of crashes as explained here. People not even aware of the Discord thread about RTO and this of issue were also mentioning that RTO could be the problem. I directed them here but don't know what they could do to help get more exposure on this issue.
Maybe the issue title should mention RenderTransformOverride somewhere to bring more attention to it since the generic crash title might not be very interesting to open for the devs.
I think it should be mentioned that Shrike who linked another issue here also mention that this happens when scale is 0.1, scale 0 doesn't seem to be the problem here as I don't get people crashing around me and I am using scale 0 since day 2 of Resonite. I only see people crashing like this in bigger worlds and it doesn't start with me joining, it has already been ongoing.
Here's the stack trace when this happen, showing a more generic Direct3D error and noting a script error happening seconds before.
Script error: OnPreCull
========== OUTPUTTING STACK TRACE ==================
0x00007FFB13498BF9 (d3d11) CreateDirect3D11SurfaceFromDXGISurface
ERROR: SymGetSymFromAddr64, GetLastError: 'Attempt to access invalid address.' (Address: 00007FFACC27C368)
0x00007FFACC27C368 (UnityPlayer) (function-name not available)
ERROR: SymGetSymFromAddr64, GetLastError: 'Attempt to access invalid address.' (Address: 00007FFACC280524)
0x00007FFACC280524 (UnityPlayer) (function-name not available)
ERROR: SymGetSymFromAddr64, GetLastError: 'Attempt to access invalid address.' (Address: 00007FFACC33A410)
0x00007FFACC33A410 (UnityPlayer) (function-name not available)
ERROR: SymGetSymFromAddr64, GetLastError: 'Attempt to access invalid address.' (Address: 00007FFACC33A9BE)
0x00007FFACC33A9BE (UnityPlayer) (function-name not available)
ERROR: SymGetSymFromAddr64, GetLastError: 'Attempt to access invalid address.' (Address: 00007FFACC33BE9D)
0x00007FFACC33BE9D (UnityPlayer) (function-name not available)
ERROR: SymGetSymFromAddr64, GetLastError: 'Attempt to access invalid address.' (Address: 00007FFACC33BF78)
0x00007FFACC33BF78 (UnityPlayer) (function-name not available)
0x00007FFACC663988 (UnityPlayer) UnityMain
0x00007FFB17C67034 (KERNEL32) BaseThreadInitThunk
0x00007FFB1989CEC1 (ntdll) RtlUserThreadStart
I understand that this is a vague name for a issue, but tbh we have zero clue what it's causing it. It's the best we have.
These crashes are really bad and I can easily say it's been happening to like 10 other people I know. I hope the Team looks at this soon, because it makes the game unplayable when it starts happening.
Here's 3 of crashes that happened back to back last night: crashes.zip
If you can provide some good example content of where you see crashes the most, that would help substantially on this.
If you can provide some good example content of where you see crashes the most, that would help substantially on this.
I'm pretty sure this very agnostic, meaning it's not some specific content.
Thing is that it's very inconsistent like others have said, like it can happen now or maybe in 2 hours. it's a lot easier to reproduce when you a lot of people around since you're increasing the chances of happening.
I guess easiest way to reproduce is to have to have bunch of people around in a world and just have one person have the RTO component on their avatar and pray for a crash.
Looking at some of the logs, it's entirely possible a bad texture could be a culprit for some people. So I'm not entirely convinced that it's RTOs. Additionally- the specific code that handles our connection with Unity shouldn't be resulting in the game object becoming null seemingly at random like this. I'm not ruling this out however, just requesting content that has a higher likelihood of triggering this so we can dig deeper.
The error seems to happen at Script error: OnPreCull
and looking at the code, that method name exists in a lot of places but the most relevant one seems to be this one in HeadOutput class
Lets focus on getting to a more reproducible state rather than debugging snippets of code on the issue tracker.
Crashed 5 times last night in the space of an hour. Looks like it was 0x00007FFD3B734AA9 (d3d11) CreateDirect3D11SurfaceFromDXGISurface
each time. At the time everyone present claimed they were not using the RTO component (I wasn't either).
error.log error.log error.log error.log error.log
The world has been solid normally, so it being something on someone's avatar seems most likely.
I just recently had a session that was having multiple unity crashes, I wasn't able to test all too much due to it being a social setting and the user in question I wasn't too familiar with.
We had a series of four users crashing very close together timewise, with some crashing multiple times. On asking around we found one user who was using RenderTransformOverride, and we asked them to change their avatar to something else. After that, we did not have anymore Unity crashes.
Wanting to test it a bit more, we wondered if the same avatar would crash users if RTO was set to 0.001, after doing such a we had two users unity crash within minutes.
Posted below is a picture of the RTO (before we modified it during testing)
and the player.log I had when crashing, showing the same error as users above, this was before we modified it:
Given the user was someone I didn't know and their avatar was custom, I did not want to trouble them further with requests. (Plus I wanted to just hang with friends and not turn it into a debug session)
Cross posting one item from Discord as a potential repro: resdb:///3ddaada6d1c6bc2fe4e907ce5339c1474e6e383ea0374e92d5dc5bed23c9c4b4.brson
Also as a general reminder, please post all logs, not just the crash logs and dumps. This helps us better find holes in our logging, and can lead to uncovering consistent symptoms.
As Geenz had indicated above- please include all of the crash logs, including the Resonite logs from the Logs
folder where Resonite is installed, the Player.log and Crash.log if available from the crashes folders %TEMP%\Yellow Dog Man Studios\Resonite\Crashes
.
Additionally there seem to be a wide number of variables to this issue, and it would help immensely if some replication conditions/commonalities could be established across all of the cases observed:
The more commonality that can be identified between all known cases, and the simpler the replication conditions can be made- the easier this will be for us to solve.
GPU driver version could also potentially be a relevant factor to identify/nail down. E.g. for Nvidia users, are all affected users using GeForce Experience/updated to the latest driver? If so, is there an earlier driver version where the issue does not occur?
Crash experienced after spawning the repro item a bunch of times (resdb:///3ddaada6d1c6bc2fe4e907ce5339c1474e6e383ea0374e92d5dc5bed23c9c4b4.brson) and then spawning the default avatar from Resonite Essentials a bunch of times, and doing other stuff like destroying the avatars, undo destroy, equipping the avatars, respawning... Not sure exactly how to reproduce the crash.
Happened in my local home with nobody else in there. Launched with SteamVR but was in desktop mode. Nvidia GTX 1080Ti (537.13 driver). Geforce experience 3.27.0.112. DESKTOP-H976HO2 - 2023.10.20.831 - 2023-10-23 19_44_20.log error.log Player.log
(Editing to also include the dmp file) crash.dmp
Intel i7 8700k, 32GB RAM, Windows 11
It happened again and I was just doing the same thing. Spawned both types of avatars a bunch of times and then started destroying them and undo destroy. In my local home with nobody else in there. crash.dmp DESKTOP-H976HO2 - 2023.10.20.831 - 2023-10-23 20_32_59.log error.log Player.log
So I actually just managed to reproduce this using the same steps as Nytra, also mixing in my own Avalis. At first it didn't seem to do anything, but once I started equipping the avatars and jumping around between them the Unity crash came relatively fast after that. Maybe equipping an RTO quickens the crash, or is a root cause? Oh, also this was in my cloud home, which is mostly still the default though I do have a video player in it playing a Youtube stream.
This was in SteamVR mode but in desktop via hotswitching, AMD 5800X, 32GB RAM, and Nvidia RTX 3080 with drivers 545.84. Unfortunately I forgot to launch without mods, but here's my logs nonetheless. I might try again being sure to disable mods later.
BLACKLIGHT - 2023.10.20.831 - 2023-10-23 14_45_40.log crash.dmp error.log Player.log
I got it to happen again this time launching in screen mode without SteamVR running. Again just spawning the default avatar and repro avatar a bunch of times and destroying, undo destroy and equipping them. crash.dmp DESKTOP-H976HO2 - 2023.10.20.831 - 2023-10-23 21_03_09.log error.log Player.log
I managed to record a video with OBS of a Unity crash happening. This one happened very quickly which I was surprised about. This time I was just duplicating the avatars. https://youtu.be/z6VgBFhnCzQ?si=dNZdmKfGxFg5KRKM
crash.dmp DESKTOP-H976HO2 - 2023.10.20.831 - 2023-10-23 21_17_27.log error.log Player.log
I seem to have replicated it under the same conditions:
resdb:///3ddaada6d1c6bc2fe4e907ce5339c1474e6e383ea0374e92d5dc5bed23c9c4b4.brson
into the worldI could seemingly not replicate it with the platform head and hands avatar, suggesting some fundamental difference between the two. I will try isolating the issue further, e.g. applying the RTO to a cube and performing the same test.
Based on your most recent log file, and the log file I produced though, I am not confident that RTO is the smoking gun.
SHIFTY-LAPTOP - 2023.10.20.831 - 2023-10-23 13_44_25.log error.log Player.log
As a sanity check, I tried removing the RTO, and duplicated consisderably more copies of that avatar without issue. I will now try the reverse, and create a box with just the RTO, followed by a simplified skinned mesh/avatar (no ProtoFlux) to also test against.
Applied RTO to a box on a skinned mesh renderer, and seemingly haven't been able to replicate the issue under the same condition. So results are inconclusive. But the use of the linked replication object resdb:///3ddaada6d1c6bc2fe4e907ce5339c1474e6e383ea0374e92d5dc5bed23c9c4b4.brson
does seem to fairly reliably cause the crash to occur.
Could you try to simplify the replication object by stripping off components, Protoflux, etc. down to the absolute bare essentials and see if it still producces the issue, @Nytra?
Remove as much from the object as you can while it is still able to replicate the issue- then if possible, try to create a new object that meets those same conditions to see if it can be simplified further still.
I removed every slot and component on the avatar except for the RTO and the skinned mesh renderers connected to it. I then tried duplicating but was unable to make the crash happen. I will try some more things like maybe keeping the IK and avatar components intact.
Thank you for testing. If we can get the replication case isolated/narrowed down, it'd definitely help a lot- but just having that relatively consistent replication case already helps a ton. :)
Removing the EyeManager component from the replication avatar seems to stop the crash from happening. So maybe the RTO and EyeManager components are together causing the crash?
Applied RTO to a box on a skinned mesh renderer, and seemingly haven't been able to replicate the issue under the same condition. So results are inconclusive. But the use of the linked replication object
resdb:///3ddaada6d1c6bc2fe4e907ce5339c1474e6e383ea0374e92d5dc5bed23c9c4b4.brson
does seem to fairly reliably cause the crash to occur.Could you try to simplify the replication object by stripping off components, Protoflux, etc. down to the absolute bare essentials and see if it still producces the issue, @Nytra?
Getting a almost consistent crash with using this asset and duplicating roughly 30 times, I'll see if narrowing it down on my end helps at all.
(Stripping down protoflux without my own avatar in the world)
Removing the EyeManager component from the replication avatar seems to stop the crash from happening. So maybe the RTO and EyeManager components are together causing the crash?
I'm seeing something similar, I did my replication case of duplicating the stripped down avatar 30 times without the eye manager and saw no crash. When respawning the avatar and doing it normally right after, I got a unity crash
@Nytra Can you try confirming on your side if removing the Blink and Eye rotation drivers on the Eye manager stops the crashing from happening? I wonder if its the bone rotation or the blink itself causing it
@ohzee00 removing those drivers does seem to stop the crash from happening.
Removing the OpenCloseTarget drive alone seems to stop the crashing. So the problem is likely related to blendshapes on SkinnedMeshRenderers.
Removing the OpenCloseTarget drive alone seems to stop the crashing. So the problem is likely related to blendshapes on SkinnedMeshRenderers.
Can confirm this too with my replication case of duplicating the avatar, removing the pivots on the bones does not affect things. Only removing the blendshape affects it.
Going to do a log dump and video of this incase that helps in replicating what I'm doing exactly.
My replication case is as follows:
Using a stripped down version (literally removing the protoflux and that's it) of the submitted model above I can cause a crash semi-reliably within a minute.
DESKTOP-V75BHJO - 2023.10.20.831 - 2023-10-23 18_14_44.log
Video of removing the blendshape, duplicating the avatar then doing a test without removing the blendshape and crashing shortly after:
"You will most likely crash within a minute"
Is this issue time-based, @ohzee00 @Nytra? Can it be replicated with just one copy of the avatar in a world, and enough time passed?
If the blendshape is driven by means other than the eye manager, e.g. if it's driven/continually being changed does the issue still occur as well?
I tried driving that blendshape directly with the Random Float ProtoFlux node and then duplicated the avatar again but the crash didn't happen.
I currently have a test going with the avatar's blink being ping ponged 50x the speed. I'll tell you how that goes as I just leave it.
Before this test I did have a crash happen with pannering the blendshape but I'm having issues replicating as fast as the duplicate case
Though the time seems to be random, sometimes as soon as I'm done duplicating the avatars, it crashes, or other times I have to wait a minute to crash.
As a sanity check, could you monitor your resource usage, e.g. memory on your system to check that you aren't running out of memory/crashing because of that, @ohzee00?
As a sanity check, could you monitor your resource usage, e.g. memory on your system to check that you aren't running out of memory/crashing because of that, @ohzee00?
Doin' all fine when having a crash, barely goes over 2gb in ram, vram is fine and GPU is doing its best.
Thank you.
So I did more testing, doing the same replication case above it seems it does not matter the scale or literally any transform on the RTO component.
Position override causes the crash, rotation causes the crash and scale, even at .999 scale still does it. I even tried it at a scale of 1 on all axises and it still does it!
However, when removing the Mesh_face skinned mesh renderer reference in the RTO component (the mesh that has the blink), I'm not getting any crashes. At least, in my testing in the course of roughly 30 minutes.
My brain is kinda going numb at doing constant restarts at this point so unless someone has more exact testing I'm probably going to stop here for the night.
I really hope this is enough to pin it down what it could be.
Did a bit more testing and thanks to Shrike, I was able to find the base asset.
resrec:///U-ohzee/R-a54d6f53-cc5f-4c53-9516-f57f78b0700f
This is a very simple world with the avatar above imported raw, removed its VRIK and has only the blink being driven externally by a eye manager, with of course the RTO still being on the head itself. I was still able to get a crash with just this when duplicating the avatar 30 times.
I have managed to narrow this down with the above. This seems to be a bug with Unity specifically unfortunately.
This seems to occur when forceMatrixRecalculationPerRender
on a SkinnedMeshRenderer is set to true, which RenderTrasnformOverride will enable.
I have made a temp change to the code, where this would be set to true for every single SkinnedMeshRenderer. I have then duplicated the skinned mesh renderers without the RenderTransformOverride and still got a crash.
Conversely clearning the skinned mesh renderer in the list on the component and duplicating does not result in a crash.
This makes things a bit more difficult unfortunately, since it's a bit out of our control. The first thing to try would be to update Unity version to a newer one, since we're currently on an older 19f, but there were some issues to resolve with the new version.
So it is not specifically related to the eye blinking blendshape being driven by the EyeLinearDriver? It seemed as though removing that field drive prevented the crashing, even though presumably that SkinnedMeshRenderer still had forceMatrixRecalculationPerRender
enabled.
Can there be a temporary solution for this until the unity upgrade happens? like completely disabling the component's functionality in the next build and enabling it once everything is sorted? that would fix the crashing.
@Nytra Yes, it's unrelated. Disabling forceMatrixRecalculationPerRender and still duplicating the mesh failed to produce a crash.
@hazre Not without disabling or crippling existing functionality with the RenderTrasnformOverride and breaking some other things. However I do want to add RenderMaterialOverride that might provide alternate solution in a lot of cases.
This will hopefully be resolved by #585- but for now, this issue is blocked.
There should be an improvement in 2024.1.12.1336. For RenderTransformOverride, the meshes won't get flagged with the transform doesn't actually get overridden on user's end. This should reduce the number of meshes that are actually flagged like this and thus improve stability.
It's still not a fix, because the underlying issue is there, but it should help hopefully.
Describe the bug?
Oddly can't find any ongoing issues related to a series of crashes others have been experiencing as well. Theories of it being related to render transform override scaling anything to 0,0,0 is thought to crash random users in a world even if they themselves do not use them.
I originally believed it was the fog volume in a session of mine but realised it couldn't be as I wasn't crashing until visitors arrived and were around for a while. I then crashed again after joining and being in someone else's session.
All I have specifically are the Unity Crash Logs as the crashes halt the Resonite logs mid-line. I can provide those logs as well if needed, but this appears to be an issue in the rendering pipeline's handling of something.
To Reproduce
Basically exist in any session with a user who has Render Transform Override forcing a slot to 0,0,0 scale.
Expected behavior
Ideally, it shouldn't randomly crash anyone.
Screenshots
No response
Resonite Version Number
2023.10.17.464
What Platforms does this occur on?
Windows
What headset if any do you use?
Index, Desktop
Log Files
Crash_2023-10-18_051740400.zip Crash_2023-10-18_060702704.zip
Additional Context
I couldn't find any other reports specifically about this and nobody directed me toward any, just the usual "put it on github".
If any additional info is needed, let me know.
Reporters
@PollyGeist on Discord (me)