Open Ghabry opened 4 years ago
This semi fits here as the fonts are large: I analyzed the Player using bloaty
.
tl;dr: Disabling everything and removing built in fonts and encoding support will bring us down from 12 MB to around ~5-6 MB. Everything includes audio making the barely pretty useless. Removing encoding support will break Chinese/Japanese/Korean games. Afterwards there is no simple way to get even smaller. We just have too much functionality in it.
A saving of 1 MB means that the game can load ~3.4 pictures of size 320x240 more. So this can help a bit in low memory conditions.
Note that bloaty requires debug symbols for certain features which increase the size of the binary. Debug symbols are not part of the final binary but part of the calculation.
symbols
is for specific functions. compileunits
lists the size per file (though this can be inaccurate). This is just a different way to represent the information. I output 10 entries per section (different locations in the executable where stuff is stored).
I couldn't get .eh_frame
(exception handling) disabled. Was even generated with -fno-exceptions
. This is usually disabled to save space (around 0.5 MB).
tl;dr 12.1 MB
tl;dr 7.30MB
What you can see is that disabling audio and freetype reduces the executable size from 12.1MB to 7.30MB. Is of course not desired to have all the audio disabled but just so you can see what the "base size" of a minimal Player is. Still pretty large.
The largest offenders when having everything enabled appear to be fluidsynth with this weird rand_table
. rand_table
is some dithering table in fluidsynth. No idea what the purpose is but this is very large. Maybe can be patched out without impacting quality too much?
0.0% 0 6.0% 744Ki .bss
NAN% 0 50.3% 375Ki rand_table
NAN% 0 18.3% 136Ki midisynth::(anonymous namespace)::envelope_table
NAN% 0 8.6% 64.0Ki midisynth::(anonymous namespace)::vibrato_table
NAN% 0 6.1% 45.1Ki [862 Others]
NAN% 0 4.4% 33.0Ki palette
NAN% 0 4.2% 31.3Ki Window_Keyboard::layouts
compile units:
52.4% 5.94Mi 49.2% 5.94Mi .text
74.5% 4.43Mi 74.5% 4.43Mi [881 Others]
13.3% 811Ki 13.3% 811Ki easyrpg-buildscripts/linux-static/freetype-2.13.0/src/lzw/ftlzw.c
2.0% 120Ki 2.0% 120Ki pixman-mmx.c
Freetype takes up almost an MB. pixman-mmx is x86 cpu specific so this will differ depending on the platform.
26.2% 2.98Mi 24.7% 2.98Mi .rodata
64.7% 1.93Mi 64.7% 1.93Mi [section .rodata]
32.3% 985Ki 32.3% 985Ki player/src/font.cpp
2.1% 64.3Ki 2.1% 64.3Ki player/src/bitmap.cpp
0.7% 20.5Ki 0.7% 20.5Ki player/src/scene_logo.cpp
After disabling all optional dependencies
32.8% 2.36Mi 32.3% 2.36Mi .rodata
26.0% 628Ki 26.0% 628Ki icudt69_dat
23.7% 572Ki 23.7% 572Ki BITMAPFONT_WQY
14.8% 358Ki 14.8% 358Ki [section .rodata]
12.5% 302Ki 12.5% 302Ki SHINONOME_GOTHIC
4.7% 114Ki 4.7% 114Ki [622 Others]
3.5% 84.1Ki 3.5% 84.1Ki BITMAPFONT_TTYP0
2.6% 64.0Ki 2.6% 64.0Ki hard_light
icudt69_dat
is the ICU encoding table. Most of the size comes from CJK encodings. Removing support for them gives us alot of hate but should save around 600 KB :P (we still need the table for european encodings and normalisation).BITMAPFONT_WQY
is the Chinese pixel font. Another 600 KB.SHINONOME_GOTHIC
is mostly the Japanese font but also some Latin extended glyphs etc.TTYP0
another font with lots of latin glyphshard_light
lookup table for the Tone blit. Cannot be removed.Getting rid of some fonts and breaking many games around the world will likely save 1.2 MB. Problem is that we cannot say "use a TTF font" here because enabling Freetype takes 1 MB...
7.5% 554Ki 7.4% 554Ki .rela.dyn
28.9% 160Ki 28.9% 160Ki [680 Others]
21.1% 117Ki 21.1% 117Ki RTP::rtp_table_2k3
12.4% 68.9Ki 12.4% 68.9Ki RTP::rtp_table_2k
7.0% 38.9Ki 7.0% 38.9Ki lcf::TypedField<>
5.1% 28.5Ki 5.1% 28.5Ki lcf::Struct<>::fields
3.4% 247Ki 3.3% 247Ki .data.rel.ro
31.6% 78.4Ki 31.6% 78.4Ki [623 Others]
17.1% 42.3Ki 17.1% 42.3Ki RTP::rtp_table_2k3
15.9% 39.3Ki 15.9% 39.3Ki RTP::rtp_table_2k
The RTP lookup table for RTP detection requires 280 KB.
I ported EasyRPG to the Dreamcast and there's quite a lot more that can be done. (Note that my observations were done with LTO enabled)
For your information, the Dreamcast has 16MB of RAM along with 8MB of VRAM and 2MB of SPU RAM. However, because EasyRPG only has a software renderer available, the VRAM only ends up being used for the framebuffer and the SPU RAM also sits unused.
First thing i did was to remove the spanlite dependency in liblcf (instead using std::span directly) and partially switch to std::format instead of libfmt. Doing both save an additional 600kb or so. (it may be possible to save more by keeping libfmt with https://vitaut.net/posts/2024/binary-size/ but using std::span but i didn't have much luck with that).
Disabling the RTP routines saved a bit but it was a minor improvement. Disabling most built-in fonts except rmg2000 yielded quite a big improvement, i believe it was 1Mb+ for the japanese fonts alone. (removing ttyp0 gave me a very small benefit by comparison)
Disabling translations yielded a similar improvement to disabling RTP stuff, nothing huge.
Enabling mp3 decoding with mpg123 only adds an additional ~100kb... I did have to use the following :
-disable-components --disable-libsyn123 --enable-libmpg123 --disable-32bit --disable-largefile --disable-feature_report --disable-messages --disable-moreinfo --disable-id3v2 --disable-debug --disable-icy --with-cpu=generic_float --disable-real --disable-int-quality --enable-lfs-alias
Note that on some platforms, you may need largefile support (or you may have to enable 32-bits output support) but the rest doesn't really matter in the context of EasyRPG. Enabling portable mode in mpg123 unfortunately leads to linking errors... i don't know if that can be avoided.
Pixman is a huge dependency, unfortunately there's no real alternative in EasyRPG right now. Previously it was super slow with 16-bits in particular but thanks to ghabry's 16bit, it's a lot better (more so than 32bpp was even) albeit still not perfect. 16-bits support (On Dremcast, it's done by having an internal A1R5G5B5 buffer that's then uploaded to VRAM with Store Queues and displayed with PVR) not only helps with performance but also saves on memory.
For audio, i wasn't able to do much but to use a stripped down version of SDL with only audio because i encountered issues trying to make it work with sndstream from KOS... If it weren't for MIDI support, i would have bypassed Drwav and stuff altogether and just used the native audio functions.... except that won't be possible either because there's a limit of 64kb on audio samples ! (i do believe it could be possible to stream them but then it becomes rather annoying) In any case, audio works mostly fine but i'm still not 100% happy. And because we use SDL, we cannot support hardware ADPCM support either.
KallistiOS on Dreamcast currently lacks multi-sector support for DMA from CD device (only per sector)... As a result, trying to load a file results in high CPU usage, which is quite bad. To be clear, that's not an issue with EasyRPG. I hope this will be addressed in the future.
I compared the midi implementations on Dreamcast and overall, FMMidi is still the best when it comes to CPU and memory usage. Fluidlite even with the smallest soundfont available (4.7kb) is still slower than FMMidi. That said a while back, it was not my experience on the RS-90, fluidlite was quite faster by a significant margin. But at least the Dreamcast, that's not the case anymore.
Disabling the built-in EasyRPG logo saves a few kbytes, which is not negliable in a low memory situation like the Dreamcast.
What EasyRPG could do :
These would be nice to have but obviously require a lot more work :
On my branch right now, with drwav/mpg123 and my custom dreamcast backend along with some modifications above that i did, -Oz brings it down to 3.1mb. However Oz has a significant performance penalty. -O3/Ofast increases this to 3.7mb but the games are much faster (3x times faster in fact). It appears that drawing/pixman is still a significant performance bottleneck in my experience.
The performance bottleneck is probably because the rendering code heavily relies on inlining.
On low optimisation fetching the rgb-bit-mask requires 3 function calls and 3 registers. On O2 the compiler replaces the calls with constants. So more free registers and less function call overhead.
Same likely happens with our getters/setter just not as bad in terms of perf
Yeah that's probably it altho O2 only addresses the issue with rendering, using O3 instead increases the performance further in some other cases that i can't exactly pinpoint out yet.
I did a test implementation of EasyRPG with no zlib, only Miniz and SPNG and the code size dropped by about 80 kb with LTO. (along with using miniz.h for stuff that do use crc32. XYZ also relied on zlib but i switched it to miniz as well) https://github.com/gameblabla/Player/blob/5fae74ee429991cf469fc938aff1de1ed7b2e8b3/src/image_png.cpp
Not a huge improvement but still enough to allow me to allocate a small image or charset. I can't think of much less. I may attempt minimp3 later but i don't expect huge savings either given that mpg123 is only about 100kb or so in final build. (according to github repo, i would save about 50-60kb or so)
The Player bundles many Fonts. I see use cases where font packs can be disabled (e.g. to reduce the memory footprint on devices where the entire ELF is copied into memory - homebrew stuff) Also useful for commercial releases: An English game only needs English glyphs.
TODO: Make a list of all the fonts