Closed MatteoRaso closed 1 year ago
Where does this 100MB figure come from? An SDL build folder (tested on macOS with Clang), before applying this change, takes about 2MB. Can you share the output of du -ha build | sort -h
before and after this change?
As for speed, I indeed see a 3.7% speed boost when "downgrading" from O3 to O2, but I'd like to investigate it further and pinpoint the specific optimization that causes it.
Where does this 100MB figure come from?
It came from me playing around with the code on Linux, using the release
CONF flag.
Can you share the output of du -ha build | sort -h before and after this change?
Okay, I've added the full output as a file. The TL;DR is that I got 1.7 MB before, and 1.6 MB after changing the CFLAG to O2
.
output.txt
Oh, that's 100KB, not 100MB. That makes much more sense now. I still want to investigate which specific optimization causes the slowdown, as speed is a much higher priority than a 6% size increase or build times.
Oh, that's 100KB, not 100MB.
You're right, no idea how I made that mistake. Sorry about that.
I still want to investigate which specific optimization causes the slowdown, as speed is a much higher priority than a 6% size increase or build times.
I suspect the performance is going to be platform, architecture and/or compiler dependent in this case. Although I think that -O2
is a generally safer default than -O3
and certainly a more common default.
I took a better look at this. A few points:
The majority of the size boost introduced by O3 is caused by aggressive loop unrolling. In some cases it makes senses and improved speed without notable size increases, in other cases the code compiled in an awful mess of nested ifs in non-timing critical code, e.g.:
The speed differences are mostly caused by slightly slower warm-up periods introduced by the size increase, but on the longer runs the O3 builds were still faster than the O2 ones.
I just pushed the following changes to the code that, for the most part, will get the best of both worlds:
GB_debugger_run
and GB_apply_cheat
) are no longer inlined, reducing code size and better utilizing cache.-ffast-math
was enabled which greatly improved speed despite the very minor use of floating point numbers in the core.Overall, the file size was reduce by over 10% (A greater improvement than switching to O2) while speed was actually slightly improved by roughly 2%. Thanks for pointing this out and making me dig deeper!
According to the Gentoo Wiki, using -O3 is unlikely to cause any performance boost and can actually degrade performance. From testing the code, it seems that using -O3 causes a longer installation time and increased the build size by ~100 MB.