Compress binary with upx.

dessalines commented 1 week ago

Taking two recommendations from https://github.com/johnthagen/min-sized-rust. There are a lot of other recommendations there, but they all affect the behavior, so probably aren't worth messing with.

Using codegen-units = 1 (Removes parallel code generation but makes binary smaller)

Running upx, which reduced size by 23%.

upx --best --lzma lemmy_server
38082656 ->   8734024   22.93%   linux/amd64   lemmy_server

Nothing4You commented 1 week ago

is this worth the tradeoff compared to performance optimization?

i don't know how upx works exactly, is it only impacting startup performance or also runtime performance?

the example you've shared reduces it to 23%, not by, so it's actually significantly bigger reduction, relatively seen, but this is only saving about 28 MiB by going from 36 MiB to 8 MiB.

i think most lemmy server operators will very much prefer performance over a few mb on the disk.

i see opt-level is currently also set to optimize for size, maybe this should also be changed to optimize for performance instead? at least for release builds that seems to make more sense.

what's the reasoning behind going for size optimization instead of performance optimization currently?

dessalines commented 1 week ago

I can't find the article anymore, but someone found that the rust optimizations that reduce binary size, also increase its performance. This writeup with tests seems to confirm it also, with all the profile.release settings we're using, being the recommended ones.

I'm fairly sure upx is just a binary compressor, which inflates it when its run, so it shouldn't affect performance.

EDIT: I did find these two, which seem to say its a case-by-case issue:

Nutomic commented 1 week ago

codegen-units = 1 can slow down compilation, could you test how long a release build takes with and without that setting? Im also sceptical about upx because binary size is not really a problem, but slower startup or runtime performance would be problematic.

dessalines commented 1 week ago

It should only affect release builds, so CI releases, but not development, or regular CI.

Seems to only be a 30s difference, so its probably worth it.

Type	Build time
Release (no codegen-units)	3m 15s
Release (codegen-units = 1)	3m 45s

Running the upx-zipped binary didn't seem to take much time at all, so its probably a good idea. And it shouldn't affect performance because it has to inflate it to run it anyway, no different from zipping a file.

Based on the links above, opt-level is the main one we should be concerned with when it comes to performance, but unfortunately they say its a case-by-case issue which setting is best.

SleeplessOne1917 commented 1 week ago

The codegen-units bit should positively impact performance, if not binary size, based on what the cargo book says:

More code generation units allows more of a crate to be processed in parallel possibly reducing compile time, but may produce slower code.

If the inverse of this statement is true, then fewer code generation units should result in faster code.

dessalines commented 1 week ago

Okay I just did some more tests, messing with the opt-level, which by default is 3 on release, but we have ours optimized for size (z) currently. Here are the results:

Type	Build time	Binary Size	Binary Size after upx
Release (opt-level=z)	3m 45s	38M	8.7M
Release (opt-level=3)	7m 05s	51M	11M

Based on their recommendations I'm make this change to opt-level = 3 here also, since that is supposed to be more performant. But it does make this caveat:

It is recommended to experiment with different levels to find the right balance for your project. There may be surprising results, such as level 3 being slower than 2, or the "s" and "z" levels not being necessarily smaller. You may also want to reevaluate your settings over time as newer versions of rustc change optimization behavior.

Still probably worth it though, especially since upx will make this smaller.

MV-GH commented 1 week ago

Does Lemmy not have performance benchmarks? Like (federation posts/comments/like, getPosts getComments endpoints, ... ops/sec) That way you would have a better insight in what the impact will be.

dessalines commented 1 week ago

There are some in lemmy/scripts/query_testing, but based on #4983 , which is an incredibly annoying blocker for the 0.19.6 release, we need to create unit tests for post_view that make sure any DB changes don't negatively affect the performance. I'll try to work on that this week.

Nutomic commented 1 week ago

You could also test with full lto.

dessalines commented 1 week ago

Here's the difference:

Type	Build time	Binary Size	Binary Size after upx
Release (lto=thin)	7m 05s	51M	11M
Release (lto=fat)	10m 57s	46M	11M

From the docs:

true or "fat": Performs “fat” LTO which attempts to perform optimizations across all crates within the dependency graph. "thin": Performs “thin” LTO. This is similar to “fat”, but takes substantially less time to run while still achieving performance gains similar to “fat”.

Since we run release builds pretty rarely, I'd say its worth it.

LemmyNet / lemmy

Compress binary with upx. #5140