Open ITotalJustice opened 2 years ago
another big speedup can come from rendering the bg from highest priority first to lowest.
as seen here, all 4 of the bg are rendered entirely. some pixels are transparent in bg2, that are intended for bg3 to be used. bg3 still has to have every single tile parsed and then checked if is transparent.
i could instead render bg2 first, then when rendering bg3, at the top of the loop, check if pixel[x] != transparent, if true then continue.
another example: about half of bg0 and bg1 rendering can be skipped.
this introduces a problem however for blending. what happens if pixel with a higher priority wants to blend with the pixel below? well thats simple:
if (pixel[x].opaque()) {
if (pixel[x].can_blend_with_layer(layer_num)) {
continue;
}
}
of course, a layer can be enabled to blend with multiple layers, although it can only blend with 1 at a time, so an extra check is needed to see if that pixel has already been blended.
by doing all of this, it removes that merge()
function that i have (very slow) and lots of needless tile fetching and decoding. it also means i can work on 1 pixel buffer, rather than 5 (1 obj, 4 bg) and then merging them. also, i can do the blending within the render function itself as either:
the good thing is all of these can be templated like so
enum class Blend
{
None, // no blending
Alpha, // blend 2 layers
White, // fade to white
Black, // fade to black
};
i would need to test if templating is worth it for this, but i predict that it would be.
i would like to optimise for very common cases like this example:
where every bg is enabled, but bgX (bg0 in example 1, bg1 in example 2) is entirely empty, yet, i still have to fetch and decode 240 titles! i think the only way to solve this is tile caching.
in emerald, without rendering:
1k fps
with rendering:450-460 fps
thats just over half of my fps gone just because of rendering 160 times a frame.
1: this should give a decent speed up, but still not by much.
2: while this will speed up scenes that dont use windowing and blending, this still isn't ideal because many scenes do use both windowing and blending. so those scenes will still be super slow.
3: decoding is very fast already.
4: this will be a decent speed up.