beyond-all-reason / spring

A powerful free cross-platform RTS game engine
https://beyond-all-reason.github.io/spring/
Other
219 stars 102 forks source link

Multithreaded Game Loading #1210

Open bigbluejay9 opened 9 months ago

bigbluejay9 commented 9 months ago

Goals

Speed up load times.

Requirements

No changes to the engine Lua API. Engine should be a drop in replacement for both BAR and Zero-K.

Major Concerns

Approach

Unlike LoadingMT, which offloads the entirety of CGame::Load onto a separate CGameLoadThread, we approach this processes incrementally.

  1. At the beginning, we assume that all of CGame::Load must be executed on the main thread. I.e. everything is on the 'critial loading path'.
  2. Identify a portion of the loading routine which is safe to be offloaded onto an auxiliary thread, call this the thread-safe loading subroutine.
  3. If neccessary, subdivide CGame::Load into additional stages to facilitate parallelism.
  4. Move the thread-safe loading subroutine onto an auxiliary thread. Go back to step 2.

Loading Stages

We subdivide CGame::Load into multiple loading stages. By grouping loading subroutines in CGame::Load into stages, each subroutine only needs to consider other subroutines in the same stage when evaluating thread safety. The end of each stage forms a barrier for all ongoing subroutines to complete before starting the next stage.

Below is an initial set of stages proposed by analyzing the flow of CGame::Load

graphviz

load_data_flow_viz.txt

CGameLoadScreen::SetLoadMessage

CGameLoadScreen::SetLoadMessage triggers aCGameLoadScreen::Update and CGameLoadScreen::Draw when called unless LoadingMT is set. We need to be able to call CGameLoadScreen::SetLoadMessage without triggering Update and Draw calls so that off-main-thread loading subroutines can update the loading message without executing on the main thread. However, we don't want to set LoadingMT since that also triggers a lot of other unwanted behavior. We will need to extend CGameLoadScreen::SetLoadMessage with a new argument that skips the Update and Draw calls.

First CGameLoadScreen::SetLoadMessage call

The first SetLoadMessage call also triggers two lengthy load processes:

In profiling, these two assets alone costs about ~600ms out of ~14s load time (5%). We may want to offload these two loading processes. Once the font and loading screen assets are available, CGameLoadScreen::Draw will render the loadscreen and load text as normal. Deffered font rendering may be particularly complex as CglFont is directly available from both the LuaMenu and LuaIntro environments.

Limit

If we apply this optimization technique repeatly without any changes to the Lua API, we would run into the following limitations:

Assuming the lua loading process takes the most time, the theoretical fastest load speed for BAR with no Lua API changes is ~8s (1s load defs, 1s post process defs, 6s load lua), from ~13s currently.

lhog commented 9 months ago

Thx for the excellent report @bigbluejay9 Pleasure to read as always!

LoadingMT since that also triggers a lot of other unwanted behavior.

You writeup seems to imply the LoadingMT still doesn't work correctly. Is it the case? I thought I put necessary context switches everywhere needed. if not - It's a low hanging fruit to pick.

As far as the rest the load process is quite intertwined and it mixes calls to Lua, synced and OpenGL operations, I expect it's going to be quite hard to streamline these pieces.

Ruwetuin commented 9 months ago

BAR always sets Spring.SetConfigInt("LoadingMT", 0) at luaintro

lhog commented 9 months ago

Good to know, but are there related issues to LoadingMT?

Ruwetuin commented 9 months ago

there used to see comment: https://github.com/beyond-all-reason/Beyond-All-Reason/blob/master/luaintro/springconfig.lua#L81

lhog commented 9 months ago

I think it's fixed by now.

Ruwetuin commented 9 months ago

I removed the disabling of LoadingMT setting now, but testing it I saw no real difference with LoadingMT = 1 or 0 (neither did Beherith)

lhog commented 9 months ago

Well the real difference is that the main thread can process messages, so you can the load screen and alt-tab freely.

As far as the speedup I don't think there's any. With LoadingMT=0 the main thread is 99.99% busy with loading and only the rest peanuts are allocated to update the loading window, with LoadingMT=1 the same load process is offloaded to another thread, we just free up the main thread to process window events.

bigbluejay9 commented 9 months ago

You writeup seems to imply the LoadingMT still doesn't work correctly. Is it the case? I thought I put necessary context switches everywhere needed. if not - It's a low hanging fruit to pick.

I was mainly commenting based on https://github.com/beyond-all-reason/spring/blob/a2283085aaa1e5505fa6aafaa6f2165e2a640139/doc/site/_data/configs.json#L1025, which may no longer be true.

Regardless, I think this effort is tangential to LoadingMT. As you mentioned, in LoadingMT the entire process is offloaded onto another thread, but that loading process is still sequential. This effort is meant to capture the analysis of the various data dependencies during the loading process, the result of which can then be used to parallelize the loading process (no matter if LoadingMT is 0 or 1).

As a quick example, as far as I can tell CGame::LoadMap and CGame::LoadDefs do not touch the same data structures at all (other than a call to loadscreen->SetLoadMessage) and could be run concurrently. There are more examples in the dot graph in the initial comment - all lines that run in parallel within a stage are what I believe to be entirely independent loading subroutines.

lhog commented 6 months ago

@bigbluejay9 long time no see.

The team is hunting a desync bug we introduced recently, so we pulled the gas pedal back a little bit on the pending PRs and issues like this one, but I would still like to follow up on how your investigation is progressing?