X2CommunityCore / X2WOTCCommunityHighlander

https://steamcommunity.com/workshop/filedetails/?id=1134256495
MIT License
60 stars 68 forks source link

Lag spikes caused by triggering events in DrawDebugLabels() #1245

Closed Iridar closed 1 year ago

Iridar commented 1 year ago

490 added a TriggerEvent() into the DrawDebugLabels() function, which causes noticeable lag spikes in tactical under certain conditions.

According to tests done by @remcoros with the profiler, it doesn't matter how many event triggers there are or if any EventData / EventSource are being passed. Event name or if there are any listeners to it does not appear to matter either.

This bug is currently being investigated further, but if no better solution is found, the final option is to gate it behind a config variable, disabled by default. It would be a BC break, but considering it (A) causes noticeable lag and (B) is purely for debug anyway, I feel perfectly fine about.

Iridar commented 1 year ago

Potential avenues of investigation:

remcoros commented 1 year ago

So far:

It's not `XEVENTMGR / GetEventManager()

`define XEVENTMGR class'X2EventManager'.static.GetEventManager()

image

Using a timer:

        SetTimer(0.001, false, nameof(TriggerDrawDebugLabelsDelayed));

...

function TriggerDrawDebugLabelsDelayed()
{
    `XEVENTMGR.TriggerEvent('DrawDebugLabels');
}

image

Iridar commented 1 year ago

Try using a longer timer, like 1 second. You could also set it up so that timer isn't set if it's already active. That way you won't get 625 event triggers in one frame. You could also set it up so that if the timer is already active, it is removed and added again, that way it will be continuously getting reset until DrawDebugLabels stops getting called every microsecond, and then trigger once.

remcoros commented 1 year ago

A small update.

This is without the TriggerEvent in CHL: image

The same behaviour (lag spikes as soon as gamestate is building up) can be replicated using either:

In all cases (CHL raising the event from DrawDebugLabels (part of the drawing loop)) and the above two cases, the result is:

image

The only thing I can think of is that it's running into the limits of X2EventManager, some kind of deadlock or lock contention.

robojumber:

It's certainly possible that the event manager holds a lock on the game state thread while invoking event triggers during game state building and the main thread as a result can't do its event triggering until that lock is released, which essentially moves game state building lag back to the main thread I just feel like we'd have seen it earlier... but maybe most UI code triggers its events before the game state thread gets a chance to do anything while DrawDebugLabels is triggered every single frame so it has a high chance of contending the lock

The main problem I have with this interpretation is that it'd mean triggering events for anything not game state related is a Really Bad Idea, but Firaxis does it all the time in visualization and I don't think we've observed it causing problems nearly as much

Hopefully, we can get some source of TriggerEvent/EventManager, but it seems like it might indeed not be a good idea to use EventManager to raise A LOT of events. At least removing it from the main drawing loop.

Iridar commented 1 year ago

Ahem https://github.com/X2CommunityCore/X2WOTCCommunityHighlander/issues/425

robojumper commented 1 year ago

Ahem #425

Should be noted that if we assume it's waiting for the game state thread to release a lock, then it's a tactical-only problem.

remcoros commented 1 year ago

So I tried both:

        if (!`XCOMGAME.GameRuleset.IsDoingLatentSubmission())
        {
            `XEVENTMGR.TriggerEvent('DrawDebugLabels', kCanvas, self);
        }

and

        if (!`XCOMGAME.GameRuleset.BuildingLatentGameState)
        {
            `XEVENTMGR.TriggerEvent('DrawDebugLabels', kCanvas, self);
        }

This gets rid of the spikes. I don't see a difference in behavior between those checks. Both methods temporarily stop and then start triggering the event on the exact same frames.

The side-effect of this, is that no handlers are called (so no debug drawing from mods) during that time. In my profile that was during frame 78 (Right_Mouse_Button_Release): image

To frame 98 (two frames before it starts building the visualization): image

So I think we could leave the TriggerEvent as is, wrapping it in that check to keep backward compatibility, and live with the fact mod debug draw calls are temporarily paused during state submission.

About the difference of those checks, this is in X2GameRuleset:

image

So it looks like the BuildingLatentGameState is like a cached result of 'IsDoingLatentSubmission()', probably faster.

(and it looks like BuildingLatentGameState is set to 'true' from native code, since I can't find it)

Iridar commented 1 year ago

The takeaway is that it seems to be crucial to do a if (!`GAMERULES.BuildingLatentGameState) check before triggering events in certain places.

Speculatively, triggering events while a game state is being built in a latent manner locks up the Event Manager, causing freezes or lag spikes. Perhaps, maybe even crashing the game.

Speculatively, the more often an event is triggered, the more likely it is to get triggered during latent game state building, causing the game to lock up.

So it might be beneficial to find other cases of events that get triggered a lot and add the check before them, but only in cases where it wouldn't harm the event's purpose, i.e. if an event allows to override state changes, then we can't afford to not trigger it every time.

But for the purposes of UI overrides, it should be fine.

remcoros commented 1 year ago

I looked around a bit to see if I could find any other places where triggering events cause the same 'lockup' effect. For example the Geoscape tick event. But I didn't see any difference both in-game and in the profiler when disabling that. As Robojumper mentioned, this might be a tactical-only problem.

I recently found a 'bIsLocked' private variable on X2EventManager, which may or may not be related. But this is a private var and needs to be exposed (through a public method). I didn't pursue this.

From what I saw so far, all the other events are gameplay/mod-related and not triggering them could have detrimental effects.