cryptaliagy / httpget

a ridiculously simple and small http get client to use for health checks
3 stars 0 forks source link

HTTP (127KB) and HTTPS (766KB) #16

Open polarathene opened 1 month ago

polarathene commented 1 month ago

You can build httpget with nightly for some improvements in size reduction: https://github.com/neonmoe/minreq/issues/111

Reproduction

I've adapted the referenced example to httpget build. I've omitted some extra optimizations from the linked issue which would be approx 20KB smaller (or 50%+ with UPX for further reduction via compression).

# Reproduction environment via Docker:
docker run --rm -it --workdir /build fedora:41

# - `lld` is optional, it's used with `-C link-arg=-fuse-ld=lld`,
#   slight improvement over the internal LLD linker the rust toolchain bundles
# - `musl-gcc` is required for `--features tls` since we're building from a glibc host:
dnf install -y lld git gcc musl-gcc rustup-init

# Nightly rust with the musl target (for static build) and `rust-src` (for `-Z build-std`)
# rustc 1.84.0-nightly (e7c0d2750 2024-10-15)
rustup-init -y \
  --profile minimal \
  --component rust-src \
  --target x86_64-unknown-linux-musl \
  --default-toolchain nightly
. "$HOME/.cargo/env"

git clone --depth 1 https://github.com/cryptaliagy/httpget .

HTTP only (127.5KB)

$ cargo +nightly build --release --target x86_64-unknown-linux-musl
$ du --bytes target/x86_64-unknown-linux-musl/release/httpget
534760  target/x86_64-unknown-linux-musl/release/httpget

$ RUSTFLAGS='-C link-arg=-fuse-ld=lld -C relocation-model=static' \
  cargo +nightly build --release \
  --target x86_64-unknown-linux-musl \
  -Z build-std=std,panic_abort \
  -Z build-std-features=panic_immediate_abort

du --bytes target/x86_64-unknown-linux-musl/release/httpget
127488  target/x86_64-unknown-linux-musl/release/httpget

HTTPS (766KB)

$ cargo +nightly build --release --target x86_64-unknown-linux-musl --features tls

$ du --bytes target/x86_64-unknown-linux-musl/release/httpget
1509720 target/x86_64-unknown-linux-musl/release/httpget

$ RUSTFLAGS='-C link-arg=-fuse-ld=lld -C relocation-model=static' \
  cargo +nightly build --release \
  --target x86_64-unknown-linux-musl \
  --features tls \
  -Z build-std=std,panic_abort \
  -Z build-std-features=panic_immediate_abort

du --bytes target/x86_64-unknown-linux-musl/release/httpget
995024  target/x86_64-unknown-linux-musl/release/httpget

Size improvement cargo update + UPX

For the --features tls results, if you update Cargo.lock with cargo update the sizes are 1,268,056 vs 766,320, which is a nice improvement in savings.

Use UPX to bring the HTTPS build down to just 432KB 😎 (for an HTTP-only build, this reduces down to about 70KB)

$ dnf install -y upx
$ upx --lzma target/x86_64-unknown-linux-musl/release/httpget
                       Ultimate Packer for eXecutables
                          Copyright (C) 1996 - 2024
UPX 4.2.4       Markus Oberhumer, Laszlo Molnar & John Reiser    May 9th 2024

        File size         Ratio      Format      Name
   --------------------   ------   -----------   -----------
    763216 ->    432356   56.65%   linux/amd64   httpget
polarathene commented 1 month ago

Dynamic linked gnu target (for glibc + openssl) for 112KB (HTTPS) / 84.6KB (HTTP)

For base images which need glibc + openssl for other binaries, a dynamically linked httpget could make sense for HTTPS if you'd like to keep the added weight minimal.

This would require adding another feature to Cargo.toml to use minreq/native-tls instead of minreq/https-rustls.

# Additional build flags from `neonmoe/minreq/issues/111` used, only provides a 10KB additional reduction:
# NOTE: Unlike the original issue message, these results are from a slightly modified `httpget`,
# but it should be roughly equivalent.
RUSTFLAGS='-Z location-detail=none -Z fmt-debug=none -C link-arg=-fuse-ld=lld -C relocation-model=static' \
  cargo build --release --target x86_64-unknown-linux-gnu --features tls \
  -Z build-std=std,panic_abort -Z build-std-features=panic_immediate_abort,optimize_for_size
# NOTE: ldd is not exactly accurate for knowing which externally linked dependencies are required,
# `libz` is actually from `libcrypto` (part of OpenSSL 3)
$ ldd target/x86_64-unknown-linux-gnu/release/httpget
        linux-vdso.so.1 (0x00007fff311dc000)
        libssl.so.3 => /lib64/libssl.so.3 (0x00007f3cfb0f5000)
        libcrypto.so.3 => /lib64/libcrypto.so.3 (0x00007f3cfac44000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f3cfaa52000)
        libz.so.1 => /lib64/libz.so.1 (0x00007f3cfaa31000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f3cfb1d2000)

$ dnf install -y patchelf
# Direct libraries (linked by the binary only)
$ patchelf --print-needed target/x86_64-unknown-linux-gnu/release/httpget
libssl.so.3
libcrypto.so.3
libc.so.6

# `libz` isn't available on Google Distroless,
# but since this isn't actually a directly linked library for `httpget`, this is not a concern:
$ patchelf --print-needed /lib64/libcrypto.so.3
libz.so.1
libc.so.6

If you did consider glibc dynamic linking, you should consider the build host will set the minimum glibc version to whatever it has. One way around that is via cargo-zigbuild which allows you to build with Zig and provide a glibc version baseline that's more acceptable.

UPX compression (HTTPS => 53KB, HTTP => 43KB)

With a compressed executable via UPX:

$ upx --lzma target/x86_64-unknown-linux-gnu/release/httpget

                       Ultimate Packer for eXecutables
                          Copyright (C) 1996 - 2024
UPX 4.2.4       Markus Oberhumer, Laszlo Molnar & John Reiser    May 9th 2024

        File size         Ratio      Format      Name
   --------------------   ------   -----------   -----------
    112024 ->     53088   47.39%   linux/amd64   httpget

# For reference, dynamically linked HTTP only build with UPX is 43KB:
upx --lzma target/x86_64-unknown-linux-gnu/release/httpget
                       Ultimate Packer for eXecutables
                          Copyright (C) 1996 - 2024
UPX 4.2.4       Markus Oberhumer, Laszlo Molnar & John Reiser    May 9th 2024

        File size         Ratio      Format      Name
   --------------------   ------   -----------   -----------
     84600 ->     43256   51.13%   linux/amd64   httpget

NOTE: UPX will prevent inspection of linked libraries, appearing as a static one:

$ patchelf --print-needed target/x86_64-unknown-linux-gnu/release/httpget
patchelf: no section headers. The input file is probably a statically linked, self-decompressing binary

$ ldd target/x86_64-unknown-linux-gnu/release/httpget
        not a dynamic executable

Upstream reductions for minreq (70-100KB less possible for HTTPS before UPX)

NOTE: This isn't too relevant if the image is FROM scratch with no existing trust store or private CA certs.

If https://github.com/neonmoe/minreq/issues/111 gets resolved, a separate feature for using rustls-native-certs for static builds would also work well for 70KB less (no webpki bundled) when the image already has ca-certificates available, or when you'd like to have support for self-signed certs from your private CA :)

If the other improvements from that issue were also tackled it'd be to get a static HTTPS build that is 376KB with UPX 😎

cryptaliagy commented 1 month ago

Thank you for the comprehensive breakdown! I'll investigate this on my end and see what the implication of those nightly features are.

I think I'm more partial to the idea of publishing a compressed and an uncompressed binary side-by-side so folks can choose their own threat model. I'm not super familiar with UPX or with binary compression in that way so I don't want to force that on folks.

I've already opened #17 to start updating the cargo file, I will see if I can do an update to the docs with the updated binary sizes and publish a new version to start with

polarathene commented 1 month ago

what the implication of those nightly features are.

A bit verbose, but this information might assist you with that:

RUSTFLAGS='-C link-arg=-fuse-ld=lld -C relocation-model=static'

-Z build-std=std,panic_abort \
-Z build-std-features=panic_immediate_abort

When using -Z build-std with panic = "abort", you need to specifically add panic_abort. There is also core and alloc, but these aren't relevant here when we add std. At least std or core is required to be paired with panic_abort AFAIK to avoid a build failure.

-Z build-std-features is complimentary, in this case we're using panic_immediate_abort.

For size impacts of these tweaks, I did make some notes about their individual impact here.

AFAIK, these optimizations are all fine for httpget, but building on nightly sometimes breaks requiring CI to pin a nightly release if that happens. Probably not a concern for httpget since releases are not frequent and the dependency tree isn't that big.


I think I'm more partial to the idea of publishing a compressed and an uncompressed binary side-by-side so folks can choose their own threat model. I'm not super familiar with UPX or with binary compression in that way so I don't want to force that on folks.

Oh that's perfectly ok, it was just added for reference, note that the issue title doesn't reference the UPX size reductions. I share some insights here as to caveats to keep in mind when deciding if UPX is appropriate.

In this case for static builds I think it's fine, but it's quite easy to use a separate stage in a Dockerfile that grabs the binary and runs UPX on it before copying that over to the final stage if someone wants the added compression.

At the current size without UPX involved, I think most do not have a need push it down further with compression, it'll already be compressed with the image over the network pulls, and in constrained environments it'd use more CPU and memory which would be more valuable than disk.

If you decide to publish with such it should be clear that UPX was used, especially in other projects where chasing disk size improvements can unintentionally negatively impact runtime costs.

cryptaliagy commented 4 weeks ago

Thank you so much for this detailed follow-up! I'll be trying to schedule the work sometime in the next couple of months between my work-work and class work