c3lang / c3c

Compiler for the C3 language
GNU Lesser General Public License v3.0
2.64k stars 154 forks source link

C3 output binary file size reducing Odyssey #1255

Closed lem0nify closed 2 days ago

lem0nify commented 1 month ago

Reducing binary size with stdlib

Here is the simplest program in C3:

module hello;

import std::io;

fn int main()
{
    io::printn("Hello, world!");
    return 0;
}
$ c3c build
Program linked to executable 'build/hello'.
$ du -h --apparent-size build/hello
496K    build/hello

Let's try make it smaller:

$ c3c build -Oz -g0
Program linked to executable 'build/hello'.
$ du -h --apparent-size build/hello
344K    build/hello

-152 KB. Pretty cool but still huge executable for such tiny program. Let's also strip it.

$ c3c build -Oz -g0 -z '-s'
Program linked to executable 'build/hello'.
$ du -h --apparent-size build/hello
314K    build/hello

Still very large. By the way, we use only one procedure from stdlib. Yeah, maybe it has some dependencies but do they indeed take 300KB? Just imagine how much logic a 300KB binary code may contain. It seems the entire stdlib is linked statically into our executable. Perhaps LTO will help us decrease its size more?

$ c3c build -Oz -g0 -z '-s -flto'
Program linked to executable 'build/hello'.
$ du -h --apparent-size build/hello
314K    build/hello

No, it does not. Let's see what else the compiler will allow us to do... Safety? Who needs safety?

$ c3c build -Oz -g0 -z '-s -flto' --safe=no
Program linked to executable 'build/hello'.
$ du -h --apparent-size build/hello
202K    build/hello

Well, another decent reduction. Let's also remove panics.

$ c3c build -Oz -g0 -z '-s -flto' --safe=no --panic-msg=no
Program linked to executable 'build/hello'.
$ du -h --apparent-size build/hello
190K    build/hello

And I have a question here. QUESTION: In c3c --help about optimization level we can read that code is unsafe starting from -O2 and panic messages are disabled starting from -O4:

  -O0                        - Safe, no optimizations, emit debug info.
  -O1                        - Safe, high optimization, emit debug info.
  -O2                        - Unsafe, high optimization, emit debug info.
  -O3                        - Unsafe, high optimization, single module, emit debug info.
  -O4                        - Unsafe, highest optimization, relaxed maths, single module, emit debug info, no panic messages.
  -O5                        - Unsafe, highest optimization, fast maths, single module, emit debug info, no panic messages.
  -Os                        - Unsafe, high optimization, small code, single module, no debug info, no panic messages.
  -Oz                        - Unsafe, high optimization, tiny code, single module, no debug info, no panic messages.

Then why do --safe=no and --panic-msg=no flags still affect output size if -Oz is already specified? 🤔

Building without stdlib

So, we finally have 190KB executable. It is 306KB less than on our first step and looks like this is the smallest size we can achieve by linking with the standard library. Let's drop it out too:

module hello;

extern fn void puts(char*);

fn int main()
{
    puts("Hello, world!");
    return 0;
}
$ c3c build -Oz -g0 -z '-s -flto' --safe=no --panic-msg=no
Program linked to executable 'build/hello'.
$ du -h --apparent-size build/hello
190K    build/hello

Wait, what? No difference? Yeah, it definitely links to entire stdlib regardless which procedures we are using from it. Okay, there is another command line option:

$ c3c build -Oz -g0 -z '-s -flto' --safe=no --panic-msg=no --use-stdlib=no
Program linked to executable 'build/hello'.
$ du -h --apparent-size build/hello
15K build/hello

Finally, size comparable to C output executable sizes!

So the C3 standard library is 175KB. You might say that's very little, and you'd be right! But the language will evolve, and its standard library will likely grow. And if it continues to link the entire stdlib into every executable, regardless of whether LTO is enabled, the difference between a binary with stdlib and a binary without stdlib will become larger and larger. But what is a programming language without a standard library? What reason should I have to write C3 without a standard library, if I can continue writing C with a standard library and have binaries of the same size or even smaller? Yes, of course, you might say that I can use any C libraries (including libc) in a C3 program, but I'll have to write a binding first (declare all the functions and structures I want to use from there).

Where am I going with this?

We have plenty of "C killers" for complex high-level programming, such as C++, Rust, Go, etc. They are quite feature-heavy, yes, but if I need to develop a program for which size does not matter, most likely it will be a complex massive system, so I would rather take Rust and fight with a borrow checker than write it in a very low-level language, afraid of making a mistake that will be difficult to debug later.

But we do not have a single "C killer" that is well suited for a driver developer, a hacker writing a rootkit, a developer for microcontrollers, a mobile developer who cares about users of old smartphones with limited memory, a GameBoy Advanced developer, in the end. Or even just an indie developer who is ideologically determined that his simplest program should be tiny.

When I saw C3, the first thing I thought was: "Oh, finally a really low-level nice replacement for C, with nice modern features, but not overloaded with them!" But when I saw the size of the helloworld binary which is, I remind, 496KB on my PC, I was a bit shocked. For reference, helloworld in Rust in the release build without any additional size optimizations is 411 KB on my PC.

Sorry for this huge post and my cry from the heart, but I implore you, C3 dev team, to implement LTO or something, so that we can use parts of the standard library in super-low-level programs without blowing them up to hundreds of kilobytes, and generally think about ways to reduce the size of the binary. Size, damn it, is not as unimportant as many people think!

Thank you for your attention! ❤️

lerno commented 1 month ago

Ok, so keeping the small output small is super important. What you're seeing here from the stdlib is a combination of things:

  1. Some functionality like int128 math functions are supposed (by LLVM) to be available from some external library. Typically its compiler-rt, but I wanted C3 unencumbered by those functions, so in order to get them they're implemented in C3. However they're implemented with as export and consequently aren't stripped from the binary at all times. This can be improved further, but should probably be done in conjunction with investigating when they're actually needed by LLVM output.

  2. There are symbolic stack traces, even if the debug information isn't there, there's still a bit of standard library to handle it so that hitting asserts have some reasonable behaviour. But this also increases the binary footprint.

  3. When you hit a direct unreachable or panic in the standard library, then the message there is stored as a string. This actually stacks up quite a bit. Partly this could be deduplicated. It's already done per module, but not cross module. Obviously there are even more messages with safety turned on, since the asserts will store a message for where they hit.

  4. Here's the tricky one: dynamic methods. These are stored in a separate segment on MacOS, or in startup methods directly on Windows and Linux. This is used for example to enable adding custom printf formatting to any type. However, unless we do a single module compilation we don't know if other object files will reference these, so we need to pessimistically keep them. By default methods are stripped for unused types, but anything using the standard library to print will touch on the File at least which has some dynamic methods. Also the allocators do the same.

  5. Currently the standard library is just provided as part of what's compiled. Stripped for only used code yes, but still provided. And this is adding to the executable, as if we would have statically linked libc every time. So what can be done? Well obviously the standard library could be made into a dynamic library, then that would be invisible, even though the code is there. Another thing would be to allow partially disable parts of the library that always is available at startup, like the temp allocator. More research should go into this than the one I've spent in terms of where the memory goes.

Regarding memory size. I was able to compile examples for WASM4, which has 64 kb of memory. I still retained quite a bit of memory after loading the program. This was some time ago though, and I don't know if this regressed. This was however setting no linking libc, which in itself disables large parts of the standard library. There is also a memory environment settings to use.

All in all, improving this is a work in progress. The more data I get, especially on different platforms, the better I can do.

(BTW that --panic-msg=no changed something indicates that the -Oz setting has a bug)

lerno commented 1 month ago

I am trying now on 0.6.2 (dev) and I can't reproduce any difference between -Oz with --panic-msg=no and removing it, same with safe. Note though, that is you use -Oz --panic-msg=yes you will be overriding the defaults of Oz. With just Oz on MacOS AAarch64 I get a 123kb binary. With the standard library. Without the stdlib: 50 kb and I can't seem to be able to strip it further.

lerno commented 1 month ago

Regarding your worries about the standard library, only types and functions marked as used are actually compiled. You can test this using --strip-unused=no and using compile-only to get all the object files for the entire standard library generated.

data-man commented 1 month ago

resources/examples/nolibc $ c3c build $ du -b hello_world

13624 hello_world

$ strip hello_world $ du -b hello_world

13168 hello_world

lem0nify commented 1 month ago

Regarding your worries about the standard library, only types and functions marked as used are actually compiled.

Do you mean "used in the same module"?

module hello;

extern fn void puts(char*);

fn int main()
{
    puts("Hello, world!");
    return 0;
}

In this code there is completely no usage of stdlib but it is still linked unless I provide --use-stdlib=no flag.

So what can be done? Well obviously the standard library could be made into a dynamic library, then that would be invisible, even though the code is there. Another thing would be to allow partially disable parts of the library that always is available at startup, like the temp allocator.

Recursively go during compilation from the main function (or all exported functions in the case of a library) through all functions that are called there and add only them and their dependencies to the binary.

As far as I understand, this is how Zig and possibly Odin do it. If this is not possible in the case of C3, please explain why. If we link stdlib statically, do we actually have to compile it separately? And even with separate compilation, as far as I understand, the purpose of LTO is to remove unused code on linkage stage. But LTO doesn't work with C3 at all because something has to be done on compile stage to help linker find unused code to perform LTO.

Of course, in this case there should be a specific directive meaning that the code should be in the binary even if it is not used. But, as I understand it, it already exists: this is what you mean by "marked as used", yes? But I don't understand why something in the standard library should be marked with it.

lerno commented 1 month ago

Regarding your worries about the standard library, only types and functions marked as used are actually compiled.

Do you mean "used in the same module"?

No, as traced from the main function and other entrypoints (e.g. @init functions and functions marked @nostrip)

In this code there is completely no usage of stdlib but it is still linked unless I provide --use-stdlib=no flag.

Yes, so one of the aforementioned entrypoints here is registering a signal handle to give prettier stacktraces on signals.

So what I just pushed now was a way to disable those. I also disable them by default on -O5 and -Oz. This stacktrace was bringing in the string formatter and the string output which in turn depends on other things.

Long story short, just doing -Oz now OUGHT to give you a smaller binary, without having to dump the standard library, in the pure libc case.

Recursively go during compilation from the main function (or all exported functions in the case of a library) through all functions that are called there and add only them and their dependencies to the binary.

As far as I understand, this is how Zig and possibly Odin do it.

This is also how C3 does it, it's just that there are dependencies that are there by default to enable things like nice panics and stack traces. There was some code which would retain the panic function even if it wasn't used (due to "no panic messages") - this should be fixed now.

But I don't understand why something in the standard library should be marked with it.

I hope this is a little bit clearer now.

lem0nify commented 1 month ago

this should be fixed now

Good news! Thank you for taking my concerns into consideration! Do I understand correctly that after your last fix come to release, we'll have nearly 15-20KB executable without --use-stdlib=no flag but with -Oz flag if it's just helloworld using std::io::printn?

it's just that there are dependencies that are there by default to enable things like nice panics and stack traces

And do they indeed take 170+KB?

lerno commented 1 month ago

On MacOS, the binary size dropped by 70 kb. The remaining code that is compiled (besides the hello world) is some int128 support code. If I add --use-stdlib=no on top of this I only reduce the binary by an additional 1 kb. I'm curious as to what your results are on Linux.

lem0nify commented 1 month ago

I've tried to build from dev branch but faced an error on CMake stage:

lem0nify@arch ~/src % git clone https://github.com/c3lang/c3c
Cloning into 'c3c'...
remote: Enumerating objects: 33636, done.
remote: Counting objects: 100% (479/479), done.
remote: Compressing objects: 100% (261/261), done.
remote: Total 33636 (delta 233), reused 373 (delta 189), pack-reused 33157
Receiving objects: 100% (33636/33636), 12.09 MiB | 9.99 MiB/s, done.
Resolving deltas: 100% (24737/24737), done.

lem0nify@arch ~/src % cd c3c

lem0nify@arch ~/src/c3c % git checkout dev
branch 'dev' set up to track 'origin/dev'.
Switched to a new branch 'dev'

lem0nify@arch ~/src/c3c % mkdir build

lem0nify@arch ~/src/c3c % cd build

lem0nify@arch ~/src/c3c/build % cmake ..
-- The C compiler identification is GNU 14.1.1
-- The CXX compiler identification is GNU 14.1.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
C3C version: 0.6.2
-- Found CURL: /usr/lib/libcurl.so (found version "8.8.0")
-- Performing Test HAVE_FFI_CALL
-- Performing Test HAVE_FFI_CALL - Success
-- Found FFI: /usr/lib/libffi.so
-- Looking for histedit.h
-- Looking for histedit.h - found
-- Found LibEdit: /usr/include (found version "2.11")
-- Performing Test Terminfo_LINKABLE
-- Performing Test Terminfo_LINKABLE - Success
-- Found Terminfo: /usr/lib/libtinfo.so
-- Found ZLIB: /usr/lib/libz.so (found version "1.3.1")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Found LibXml2: /usr/lib/libxml2.so (found version "2.13.2")
-- Found LLVM 18.1.8
-- Using LLVMConfig.cmake in: /usr/lib/cmake/llvm
-- Libraries located in: /usr/lib
-- LLVM was built with RTTI
-- using find_library
-- linking to llvm libs LLD_COFF-NOTFOUND;LLD_COMMON-NOTFOUND;LLD_WASM-NOTFOUND;LLD_MINGW-NOTFOUND;LLD_ELF-NOTFOUND;LLD_MACHO-NOTFOUND
-- Found lld libs LLD_COFF-NOTFOUND;LLD_COMMON-NOTFOUND;LLD_WASM-NOTFOUND;LLD_MINGW-NOTFOUND;LLD_ELF-NOTFOUND;LLD_MACHO-NOTFOUND
-- using gcc/clang warning switches
-- The following OPTIONAL packages have been found:

 * FFI
 * LibEdit
 * Terminfo
 * ZLIB
 * zstd
 * LibXml2
 * CURL

-- The following REQUIRED packages have been found:

 * LLVM

-- Configuring done (1.2s)
CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
LLD_COFF
    linked by target "c3c" in directory /home/lem0nify/src/c3c
    linked by target "c3c_wrappers" in directory /home/lem0nify/src/c3c
LLD_COMMON
    linked by target "c3c" in directory /home/lem0nify/src/c3c
    linked by target "c3c_wrappers" in directory /home/lem0nify/src/c3c
LLD_ELF
    linked by target "c3c" in directory /home/lem0nify/src/c3c
    linked by target "c3c_wrappers" in directory /home/lem0nify/src/c3c
LLD_MACHO
    linked by target "c3c" in directory /home/lem0nify/src/c3c
    linked by target "c3c_wrappers" in directory /home/lem0nify/src/c3c
LLD_MINGW
    linked by target "c3c" in directory /home/lem0nify/src/c3c
    linked by target "c3c_wrappers" in directory /home/lem0nify/src/c3c
LLD_WASM
    linked by target "c3c" in directory /home/lem0nify/src/c3c
    linked by target "c3c_wrappers" in directory /home/lem0nify/src/c3c

-- Generating done (0.0s)
CMake Generate step failed.  Build files cannot be regenerated correctly.

lem0nify@arch ~/src/c3c/build % pacman -Q llvm
llvm 18.1.8-4

lem0nify@arch ~/src/c3c/build % pacman -Q lld
lld 18.1.8-1

master branch build fails with the same error on Archlinux. I'm not familiar enough with llvm and lld to know what the exact problem is.

lerno commented 1 month ago

@lem0nify You need the lld library files as well. So import LLD on top of LLVM.

lem0nify commented 1 month ago
lem0nify@arch ~/code/c3/hello % cat src/main.c3
module hello;

import std::io;

fn int main()
{
    io::printn("Hello, world!");
    return 0;
}

lem0nify@arch ~/code/c3/hello % c3c build -Oz && du -h --apparent-size build/hello
Program linked to executable 'build/hello'.
490K    build/hello

lem0nify@arch ~/code/c3/hello % c3c build -Oz -g0 && du -h --apparent-size build/hello
Program linked to executable 'build/hello'.
344K    build/hello

lem0nify@arch ~/code/c3/hello % c3c build -Oz -g0 -z '-s' && du -h --apparent-size build/hello
Program linked to executable 'build/hello'.
314K    build/hello

lem0nify@arch ~/code/c3/hello % c3c build -Oz -g0 -z '-s' --safe=no && du -h --apparent-size build/hello
Program linked to executable 'build/hello'.
202K    build/hello

lem0nify@arch ~/code/c3/hello % c3c build -Oz -g0 -z '-s' --safe=no --panic-msg=no && du -h --apparent-size build/hello
Program linked to executable 'build/hello'.
31K build/hello

lem0nify@arch ~/code/c3/hello % c3c build -Oz -g0 -z '-s' --safe=no --show-backtrace=no && du -h --apparent-size build/hello
Program linked to executable 'build/hello'.
202K    build/hello

lem0nify@arch ~/code/c3/hello % c3c build -Oz -g0 -z '-s' --safe=no --panic-msg=no --show-backtrace=no && du -h --apparent-size build/hello
Program linked to executable 'build/hello'.
31K build/hello
lerno commented 1 month ago

Do you have any settings in project.json that might be affecting the behaviour? -Oz should update the defaults to --safe=no --panic-msg=no --show-backtrace=no.

However, if your target has any of those explicitly set then those settings override -Oz.

lem0nify commented 1 month ago

@lerno I have default project.json created by c3c init:

{
  // Language version of C3.
  "langrev": "1",
  // Warnings used for all targets.
  "warnings": [ "no-unused" ],
  // Directories where C3 library files may be found.
  "dependency-search-paths": [ "lib" ],
  // Libraries to use for all targets.
  "dependencies": [ ],
  // Authors, optionally with email.
  "authors": [ "John Doe <john.doe@example.com>" ],
  // Version using semantic versioning.
  "version": "0.1.0",
  // Sources compiled for all targets.
  "sources": [ "src/**" ],
  // C sources if the project also compiles C sources
  // relative to the project file.
  // "c-sources": [ "csource/**" ],
  // Output location, relative to project file.
  "output": "build",
  // Architecture and OS target.
  // You can use 'c3c --list-targets' to list all valid targets.
  // "target": "windows-x64",
  // Targets.
  "targets": {
    "hello": {
      // Executable or library.
      "type": "executable",
      // Additional libraries, sources
      // and overrides of global settings here.
    },
  },
  // Global settings.
  // CPU name, used for optimizations in the LLVM backend.
  "cpu": "generic",
  // Optimization: "O0", "O1", "O2", "O3", "O4", "O5", "Os", "Oz".
  "opt": "O0",
  // See resources/examples/project_all_settings.json and 'c3c --list-project-properties' to see more properties.
}
lerno commented 1 month ago

Does removing "opt" do anything?

lerno commented 1 month ago

There was a bug which I fixed now in master. See if it helps.

lerno commented 1 month ago

@lem0nify can you verify that -Oz at the command line is enough now?

lem0nify commented 1 month ago

Sorry for the delay. Yes, it seems now it works as expected:

lem0nify@arch ~/code/c3/hello % cat src/main.c3
module hello;

import std::io;

fn int main()
{
    io::printn("Hello, world!");
    return 0;
}

lem0nify@arch ~/code/c3/hello % c3c build -Oz && du -h --apparent-size build/hello
Program linked to executable 'build/hello'.
29K build/hello
lerno commented 1 month ago

Great! Do we need to try to make it smaller?

lem0nify commented 1 month ago

Great! Do we need to try to make it smaller?

Only if you don't have more important tasks. :)

lerno commented 1 month ago

I probably do. It would be nice to strip it further if possible though

lerno commented 2 days ago

I'll close this for now and then we can do another task later to continue the journey.