Interkarma / daggerfall-unity

Open source recreation of Daggerfall in the Unity engine
http://www.dfworkshop.net
MIT License
2.7k stars 327 forks source link

Added "ActiveGameObjectDatabase" to replace FindObjectsOfType #2605

Closed KABoissonneault closed 5 months ago

KABoissonneault commented 6 months ago

Intro

My solution for https://github.com/Interkarma/daggerfall-unity/issues/2355 crashes.

I have personally observed some users struggling with DFU crashes multiple times a day. The callstacks are almost always somewhere in the middle of a specific Unity function: FindObjectsOfType. DFU uses these in many places, often on every frame, for each object in the scene (see EnemySenses).

From the Unity docs: https://docs.unity3d.com/ScriptReference/Object.FindObjectsOfType.html

Note: This function is very slow. It is not recommended to use this function every frame. In most cases you can use the singleton pattern instead.

And this is precisely what this changelist does.

I have personally never reproduced these crashes. They seem to require extensive gameplay sessions on a very specific mod setup. Avoiding calls to FindObjectsOfType should for sure avoid crashes in that function - we'll see if it doesn't end up crashing somewhere else.

One thing is for sure, the performance should be better no matter what. Doing a lookup on a limited list of objects should always be faster than doing a lookup on the entire scene.

The Database

ActiveGameObjectDatabase is the namespace where we store the GameObject caches for DFU objects. I have created a new utility class, GameObjectCache, which is designed with specific performance characteristics and safety concerns. I have made this class public, since mods might want to reuse it for similar lookups on their own objects (see the Mods section below).

GameObjectCache

I don't actually suspect that the crashes we've been seeing are caused by multiple threads touching the scene at the same time. DFU does use parallel threads for jobs in the terrain system, for example, and while Unity does not actually support creating GameObjects from such parallel jobs, I have seen mods like Interesting Eroded Terrains try to do GameObject lookups from terrain jobs. Therefore, I decided to be safe and find the minimal overhead solution to make such lookups safe for any code.

GameObjectCache has two data members: a List<WeakReference<GameObject>>, and a ReaderWriterLockSlim.

For the List part, the WeakReference is there to ensure that 1) the cache does not keep objects alive for longer than necessary 2) we can safely detect destroyed objects long after Unity has cleaned them. I suspect this is not actually necessary for Unity GameObjects, maybe I went overkill on this aspect, but I don't think the cost has been an issue so far.

For the Lock part, here's the C# documentation: https://learn.microsoft.com/en-us/dotnet/api/system.threading.readerwriterlockslim.

The Cache is used in two ways: multiple systems do frequent reads to the cache, and a few system do infrequent additions to the cache. A "Reader Writer Lock" allows multiple readers to read at the same time without blocking each other, while a single writer can still claim exclusive ownership for actual modifications. This means that while we're not spawning objects, the cache has no interference. The RWLockSlim also has an "upgradable" lock feature, which I somewhat used for reading whether an object was already there before committing to the write. I still kept this check only in Editor, we shouldn't have this in real code.

For Registration, I first experimented with having factory functions like GameObjectHelper handle registration. This is because registering multiple objects at once would be faster than doing it once per object. However, I could not find a way to do it in a way that would guarantee that mods that spawn objects are also covered. So I went with the per-object registration in one of their component's Awake.

One detail of note here is that I don't "unregister" objects. I could easily add an OnDestroy event on all components where I do the register on Awake and remove them from the cache there. However, I find that this would cause many unnecessary write locks, so I decided not to do that. Instead, whenever we write to the cache, we take the opportunity to remove any destroyed objects.

Whenever a gameplay system needs to lookup all the GameObjects of a certain type, the cache will not only avoid returning destroyed objects, but it will also not return inactive objects. Similarly, when trying to get a specific component type on the objects, objects where the component is disabled will also be skipped. Both of these behaviors reproduce how FindObjectsOfType works.

DFU's caches

In the end, ActiveGameObjectCache currently has 6 caches: Enemy, Civilian Mobile NPC, Loot, Foe Spawner, Action Door, and Static NPC. These correspond to six GameObject types where FindObjectsOfType lookups are frequently made, and happen to match six different prefabs in Assets/Prefabs/Scene. For each of these GameObject types, we identify a primary component (where the registration is made) and some secondary components which are frequently used in lookups.

Enemy:

Two cases stand out here. Some DaggerfallEntityBehaviour lookups can be intended to grab more than one GameObject type. EnemySenses wants the behaviour from both other enemies and the player. PlayerGPS wants the behaviours of both enemies and static NPCs. Fortunately, it's pretty easy to concat two IEnumerable<DaggerfallEntityBehaviour> together, the changes were minimal. QuestResourceBehaviour can also be either for enemies or for static NPCs.

Ultimately, the flow looks like one of two things. Registration:

// ObjectComponent.cs
+ void Awake()
+ {
+   ActiveGameObjectDatabase.RegisterThing(gameObject);
+ }

Lookups:

// PlayerObjectComponentHandler.cs
- foreach(ObjectComponent component in FindObjectsOfType<ObjectComponent>())
+ foreach(ObjectComponent component in ActivateGameObjectDatabase.GetActiveThingObjectComponents())
{
  if(component.Type == ComponentType.Thing)
    ...
}

This is done all over for our 6 object types.

Performance

I wanna update this PR with more specific benchmarks for key DFU cases of lookups and object registration. In practice, I'm pretty confident all lookups with be faster, though object creation has extra overhead. I estimate the impact to be minimal.

Anecdotally, I made a quest that spawns hundreds of harpies, and I found the performance better with my changes (due to the many EnemySenses lookups).

Key tests:

Debugging

Such a change requires some form of debugging for when we suspect issues. Two primary tools I've implemented are:

image

Some more reference numbers from my tests running around: Entering Scourg Barrow image

Leaving Scourg Barrow image

Running around Daggerfall City during the day image

Inside a house image

Daggerfall City at night image

Daggerfall Palace image

Mod

Mod compatibility is always a concern in post 1.0. I had to reject my initial approach to object registration because mods can easily spawn DFU GameObjects in any way (Location Loader from WoD sure does). I believe the current solution should be safe from any sort of weird spawning patterns mods might have.

The crash and performance benefits on lookups will need be applied manually by mods. For example, I had users report crashes with Horrible Hordes when entering main quest dungeons, and this was due to a StaticNPC lookup. In 1.1, we will be able to use ActiveGameObjectDatabase.GetActiveStaticNPCs() instead.

One thing I did not take into account yet are DFU objects that mods might do lookups on but not DFU itself. RDBBlocks, Interiors, RMBBlocks, Lights... I did not want to overreach into object types that I couldn't test properly. Mods will have to manifest their interest if they need DFU to track more of its object types.

Improvements

Garbage collection

While I mentioned that I deliberately don't clear objects as they are destroyed, I do think we could use some points where we clear objects before a new one is added. For example, leaving a large dungeon will leave lots of dead enemies, loot, action doors, and potentially static NPCs. Lookups in PlayerGPS will still traverse these big lists to check if the object is alive, which could be avoided if we cleared the list.

@numidium Do you have suggestions for clear places where we can "collect garbage" on this system? My current idea is to do it on dungeon transitions, but I'm not sure when GameObjects are all properly destroyed after a dungeon exit.

Remaining FindObjectsOfType

I left a few FindObjectsOfType lookups.

I ignored all the FindObjectOfType (singular) lookups for DaggerfallUnity scene objects. These are often just done once on scene load for one object and cached. We don't need a separate cache.

I did not cover DaggerfallStaticDoors, which is a component on Interior and RDB objects. I was mostly concerned with having to introduce two new caches for one lookup, but maybe the caches would be useful for certain mods. I also did not cover DaggerfallMarker. It's an obscure object, I'm not sure how to test it. There is one lookup for QuestResourceBehaviour that uses Resources.FindObjectsOfTypeAll, which covers inactive objects. It's a weird unique case, I did not cover it for now. There is one lookup for SongPlayer in DefaultCommands. Commands run while the game is paused, not really an issue for crashes.

Conclusion

This is a big change to DFU, but given that the crashes really ruin some users, I believe it is necessary to integrate as soon as possible. I will block version 1.1 on this. In addition, the earlier we deploy this, the more mods can benefit from the improvements.

I will probably deploy a test build with this change (and some others) before it gets approved. We can check for regressions, and see if users get other forms of weird crashes.

KABoissonneault commented 6 months ago

I've added the "RDB" cache. Might as well cover the DaggerfallInventoryWindow check, even if I don't think it was problematic. For static doors on Interiors, well, there's only one Interior object, you can get it on the PlayerEnterExit.

I finally ran the profiling on a dev build. I added a new mark for dungeon layout. I did this one specifically because main quest dungeons do the most spawning, and it's they add objects of many types (enemies, loot, action doors, rdbs, static NPCs).

The results seem clear that this change has little impact on the creation of GameObjects.

Daggerfall.LayoutDungeon
    Master
        Scourg Barrow
            4714.68ms
            3938.70ms
            4258.01ms
        Daggerfall Palace
            3581.92ms
            3050.95ms
            3303.34ms

    New
        Scourg Barrow
            4659.25ms
            4045.06ms
            4175.31ms
        Daggerfall Palace
            3211.11ms
            3277.56ms
            3405.08ms

The numbers vary a lot, but do not diverge significantly from the average on the current master branch.

On another note, doing this, I noticed that while the automap creates a second set of RDBs for its view, those RDBs do not get destroyed when you leave the dungeon. They'll stay inactive until the player enters a new dungeon. Maybe that's something for #2528 . But yeah, another perk of having this new GameObject tracker in tdbg.

numidium commented 6 months ago

Do you have suggestions for clear places where we can "collect garbage" on this system? My current idea is to do it on dungeon transitions, but I'm not sure when GameObjects are all properly destroyed after a dungeon exit.

In #2528 I run a full sweep when exiting dungeons so that could be one place to do it (admittedly this is because it was the only way I could figure out how to get rid of every single mesh left dangling by the dungeon).

I noticed that MaterialReader uses temporal caching i.e. it clears its cached resource if it hasn't been used after a specified period of time. Here's the pruning method for reference: https://github.com/Interkarma/daggerfall-unity/blob/5b28ea0272369f81fdcd0271de9e1a09b015e4a8/Assets/Scripts/MaterialReader.cs#L943

If you go that route then beware of expiring references. I don't remember exactly which mod it was but I think Carademono was showing me resource references going null after a period of time and causing things to disappear.

KABoissonneault commented 5 months ago

This PR has been tested by quite a few people with no known regressions. It's pretty easy to test on the Test 2 build I released on Lysandus' Tomb, with the tdbg command to track the state of the data in real time.

I've reached out to a user who was experiencing frequent DFU crashes in 1.0, and they have not had any since I sent them the test build. This change is pretty essential for these users, I would not deploy a new DFU version without this.

To maintainers concerned with the size of this change, note that most changes are just changing FindObjectsOfType<Thing>() to ActiveGameObjectDatabase.GetActiveThing(). I also change some index-based for loops to foreach-loops. Once you see the pattern, it's pretty easy to get through. I know the main file is more difficult though, but I'm confident in its stability now.

I need to insist that DFU cannot move forward without two reviewers on this. All the other PRs are blocked by primarily this.