Open bigbluejay9 opened 9 months ago
Thx for the excellent report @bigbluejay9 Pleasure to read as always!
LoadingMT since that also triggers a lot of other unwanted behavior.
You writeup seems to imply the LoadingMT
still doesn't work correctly. Is it the case? I thought I put necessary context switches everywhere needed. if not - It's a low hanging fruit to pick.
As far as the rest the load process is quite intertwined and it mixes calls to Lua, synced and OpenGL operations, I expect it's going to be quite hard to streamline these pieces.
BAR always sets Spring.SetConfigInt("LoadingMT", 0) at luaintro
Good to know, but are there related issues to LoadingMT?
there used to see comment: https://github.com/beyond-all-reason/Beyond-All-Reason/blob/master/luaintro/springconfig.lua#L81
I think it's fixed by now.
I removed the disabling of LoadingMT setting now, but testing it I saw no real difference with LoadingMT = 1 or 0 (neither did Beherith)
Well the real difference is that the main thread can process messages, so you can the load screen and alt-tab freely.
As far as the speedup I don't think there's any. With LoadingMT=0
the main thread is 99.99% busy with loading and only the rest peanuts are allocated to update the loading window, with LoadingMT=1
the same load process is offloaded to another thread, we just free up the main thread to process window events.
You writeup seems to imply the
LoadingMT
still doesn't work correctly. Is it the case? I thought I put necessary context switches everywhere needed. if not - It's a low hanging fruit to pick.
I was mainly commenting based on https://github.com/beyond-all-reason/spring/blob/a2283085aaa1e5505fa6aafaa6f2165e2a640139/doc/site/_data/configs.json#L1025, which may no longer be true.
Regardless, I think this effort is tangential to LoadingMT
. As you mentioned, in LoadingMT the entire process is offloaded onto another thread, but that loading process is still sequential. This effort is meant to capture the analysis of the various data dependencies during the loading process, the result of which can then be used to parallelize the loading process (no matter if LoadingMT is 0 or 1).
As a quick example, as far as I can tell CGame::LoadMap
and CGame::LoadDefs
do not touch the same data structures at all (other than a call to loadscreen->SetLoadMessage) and could be run concurrently. There are more examples in the dot graph in the initial comment - all lines that run in parallel within a stage are what I believe to be entirely independent loading subroutines.
@bigbluejay9 long time no see.
The team is hunting a desync bug we introduced recently, so we pulled the gas pedal back a little bit on the pending PRs and issues like this one, but I would still like to follow up on how your investigation is progressing?
Goals
Speed up load times.
Requirements
No changes to the engine Lua API. Engine should be a drop in replacement for both BAR and Zero-K.
Major Concerns
CGame::LoadLua
must be evaluated on the main thread. When loading assets outside of lua, we must ensure that uploading textures and shaders occur on the main thread.Approach
Unlike
LoadingMT
, which offloads the entirety ofCGame::Load
onto a separateCGameLoadThread
, we approach this processes incrementally.CGame::Load
must be executed on the main thread. I.e. everything is on the 'critial loading path'.thread-safe loading subroutine
.CGame::Load
into additional stages to facilitate parallelism.Loading Stages
We subdivide
CGame::Load
into multiple loading stages. By grouping loading subroutines inCGame::Load
into stages, each subroutine only needs to consider other subroutines in the same stage when evaluating thread safety. The end of each stage forms a barrier for all ongoing subroutines to complete before starting the next stage.Below is an initial set of stages proposed by analyzing the flow of CGame::Load
load_data_flow_viz.txt
CGameLoadScreen::SetLoadMessage
CGameLoadScreen::SetLoadMessage
triggers aCGameLoadScreen::Update
andCGameLoadScreen::Draw
when called unless LoadingMT is set. We need to be able to callCGameLoadScreen::SetLoadMessage
without triggeringUpdate
andDraw
calls so that off-main-thread loading subroutines can update the loading message without executing on the main thread. However, we don't want to set LoadingMT since that also triggers a lot of other unwanted behavior. We will need to extendCGameLoadScreen::SetLoadMessage
with a new argument that skips theUpdate
andDraw
calls.First CGameLoadScreen::SetLoadMessage call
The first SetLoadMessage call also triggers two lengthy load processes:
In profiling, these two assets alone costs about ~600ms out of ~14s load time (5%). We may want to offload these two loading processes. Once the font and loading screen assets are available,
CGameLoadScreen::Draw
will render the loadscreen and load text as normal. Deffered font rendering may be particularly complex as CglFont is directly available from both the LuaMenu and LuaIntro environments.Limit
If we apply this optimization technique repeatly without any changes to the Lua API, we would run into the following limitations:
Assuming the lua loading process takes the most time, the theoretical fastest load speed for BAR with no Lua API changes is ~8s (1s load defs, 1s post process defs, 6s load lua), from ~13s currently.