dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.31k stars 4.74k forks source link

Symbol stripping in NativeAOT to reduce binary size #69847

Closed am11 closed 2 years ago

am11 commented 2 years ago

Repro:

# bash on linux-x64

$ dotnet7p5 new console -n nativeapp1
$ dotnet7p5 publish nativeapp1 --use-current-runtime -p:PublishAot=true -c Release -o artifacts

# check the app size (in bytes)
$ stat -c%s artifacts/nativeapp1
17962760

# extract symbols in .dbg file, strip unneeded symbols from them binary and link .dbg with binary
# see https://github.com/dotnet/runtime/blob/5d3288d/eng/native/functions.cmake#L374
$ objcopy --only-keep-debug artifacts/nativeapp1 artifacts/nativeapp1.dbg
$ objcopy --strip-unneeded artifacts/nativeapp1
$ objcopy --add-gnu-debuglink=artifacts/nativeapp1.dbg artifacts/nativeapp1

# check the size again
$ stat -c%s artifacts/nativeapp1
5895664

# size of dbg
$ stat -c%s artifacts/nativeapp1.dbg
12070608

# test if debug symbols are read by the debugger
$ gdb artifacts/nativeapp1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from artifacts/nativeapp1...
Reading symbols from /home/am11/projects/artifacts/nativeapp1.dbg...
(gdb) 

Extracting symbols (in a separate .dbg file) reduced the hello world binary size by 67%. We should consider doing this by default.

dotnet-issue-labeler[bot] commented 2 years ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

ghost commented 2 years ago

Tagging subscribers to 'size-reduction': @eerhardt, @SamMonoRT, @marek-safar See info in area-owners.md if you want to be subscribed.

Issue Details
Repro: ```sh # bash on linux-x64 $ dotnet7p5 new console -n nativeapp1 $ dotnet7p5 publish nativeapp1 --use-current-runtime -p:PublishAot=true -c Release -o artifacts # check the app size (in bytes) $ stat -c%s artifacts/nativeapp1 17962760 # extract symbols in .dbg file, stip unneeded symbols from them binary and link .dbg with binary # see https://github.com/dotnet/runtime/blob/5d3288d/eng/native/functions.cmake#L374 $ objcopy --only-keep-debug artifacts/nativeapp1 artifacts/nativeapp1.dbg $ objcopy --strip-unneeded artifacts/nativeapp1 $ objcopy --add-gnu-debuglink=artifacts/nativeapp1.dbg artifacts/nativeapp1 # check the size again $ stat -c%s artifacts/nativeapp1 5895664 # size of dbg $ stat -c%s artifacts/nativeapp1.dbg 12070608 # test if debug symbols are read by the debugger $ gdb artifacts/nativeapp1 Copyright (C) 2020 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: . Find the GDB manual and other documentation resources online at: . For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from artifacts/nativeapp1... Reading symbols from /home/am11/projects/artifacts/nativeapp1.dbg... (gdb) ``` Extracting symbols (in a separate .dbg file) reduced the hello world binary size by 67%. We should consider doing this by default.
Author: am11
Assignees: -
Labels: `untriaged`, `size-reduction`, `area-NativeAOT-coreclr`
Milestone: -
MichalStrehovsky commented 2 years ago

For NativeAOT, we try to follow platform conventions whenever reasonable. AFAIK debug information embedded in the executable is the platform convention on Unix-like systems. Is that not the case?

FWIW, we document this on https://aka.ms/OptimizeNativeAOT (bottom of the doc).

am11 commented 2 years ago

Is that not the case?

I am not sure what is the particular Unix wide convention and if there is one, but most binaries I have found on my Ubuntu 20.04 box are stripped:

# number of stripped binaries in /usr/bin
$ find /usr/bin -exec file -L {} \; | grep stripped | grep -v "not stripped" | wc -l
1697

# number of non-stripped binaries in /usr/bin
$ find /usr/bin -exec file -L {} \; | grep "not stripped" | wc -l
6

Also, all .NET binaries are stripped on linux, including apphost / singlefilehost, so the apps published with corehost are also stripped.

MichalStrehovsky commented 2 years ago

I mean the default settings for the compiler that produces the executable. Maybe what we want is an easier way to opt in?

(I'm not Unix person myself, so don't have an opinion besides "follow the platform convention").

am11 commented 2 years ago

gcc/clang defaults usually favor Unix legacy. e.g. a.out is the default binary name which is based on the convention from pre-ELF / pre-System-V 1970's era, default output type is executable, PIC/PIE are not default (but highly recommended) etc. We do produce PIC by default, with no opting in or out, and therefore don't follow the defaults of complier toolchain.

From dotnet publish view point, it would also make sense to align PublishAot's behavior with PublishSingleFile, which produce stripped binary. One key difference would be that .dbg file is produced next to the binary in case of NativeAOT (singlefilehost's native symbols are normally fetched from the server when SOS is installed). This way opt-in won't be necessary.

Opt-out is also unnecessary for this IMHO. Non-stripped binaries are generally not distributed. Folks who really want embedded symbols can use tools like eu-unstrip to reverse this effect.

Note: user is not missing anything, all symbols are there, but in a separate "fat symbol file" (symbol file is rarely needed in production environment).

MichalStrehovsky commented 2 years ago

So cargo build produces unstripped executables for Rust. One has to add extra stuff to cargo.toml to have cargo do it for you (added recently - https://github.com/rust-lang/cargo/pull/8246 - one had to pass extra ldflags before that).

I would still prefer to align with rustc/clang/gcc. Rustc doesn't have 50 years of legacy and they still chose unstripped to be the default.

Are there any examples of toolchains that strip by default? PublishSingleFile doesn't count because it doesn't actually generate an executable (it glues managed assemblies at the end of a preexisting executable - the symbols would be meaningless for the glued part).

am11 commented 2 years ago

Makes sense. I think having it optional and wire it with an msbuild property like <StripSymbols>true/false would be enough.

Are there any examples of toolchains that strip by default?

I tested with nexe (node.js native), it also produces unstripped binary.

jkotas commented 2 years ago

@am11 Are you be interested in contributing the build targets for this to make it easy to opt-in into symbol stripping?