DaemonEngine / Daemon

The Dæmon game engine. With some bits of ioq3 and XreaL.
https://unvanquished.net
BSD 3-Clause "New" or "Revised" License
300 stars 60 forks source link

long map loading times #567

Open ghost opened 2 years ago

ghost commented 2 years ago

I gave a shot at profiling loading times, in debug mode (things are obviously better when stuff is optimized aggressively, but one of my main concerns is about faster debugging of AIs, which requires to stare^W look at them and restart game a lot in between minor changes). The method is probably not optimal, for various reasons, but I think it pinpointed one big cause: loading of images is done sequentially. Before reading further, please keep mind that's just preliminary look and that I don't know much about daemon's code and architecture, not to mention crunchlib.

What resulted from google-perftools' results is that ~20% of the time is spent in CRN decompression, in this place. After a (very) quick glance at the code I think the idea of multi-threading it could improve the situation (notably the 1st level loop seems a good candidate, but then, it would only be done for a single file format, so maybe better results could be achieved by multi-threading things "above" this place, to allow loading multiple images at a time.

Notably, it seems that 2 big candidates are the calls of ParseNormalMap and ParseDiffuseMap, children of ParseStage, itself children of ParseShader. I do not know how much it would be hard to create one thread for each of the children + another one to load possibly stuff in the meantime, but I think it may considerably reduce loading times of the client.

necessarily-equal commented 2 years ago

Last time I looked, I noted that uncrunching was indeed the first contender, followed by decompressing the opus/ogg files and (static!) UI rendering, and various other assets things like zip deflate()-ing.

My other observation was that Unvanquished essentially does one thing at a time and is essentially doing single-threaded loading. Which means there is quite a parallelization potential.

I was rather looking at loading different shaders in parallel instead of parallelizing the load of just one. There are some functions like CG_RegisterSounds that are a priori heavily parallelizable.

illwieckz commented 2 years ago

I was rather looking at loading different shaders in parallel instead of parallelizing the load of just one.

This is a good idea because if I'm right, some code may need images to be actually loaded and their format known to build up the data of the material. For example you may have to know a normalmap has alpha channel to know it embeds height map, but you may have to know this is a specific variant of DXT format so in fact the alpha channel is abused for storing another color channel and then it's a component of a normal map, not an height map… meaning you may wait for an image load to have succeeded to do the next step in shader loading and parsing. For example we cannot just parse all shaders and store somewhere the image list to load all the images after that with an aggressive parallel loader.

But we can very probably be able to parse shaders in parallel, so we would parallelize sequential tasks.