Closed dessalines closed 1 week ago
is this worth the tradeoff compared to performance optimization?
i don't know how upx works exactly, is it only impacting startup performance or also runtime performance?
the example you've shared reduces it to 23%, not by, so it's actually significantly bigger reduction, relatively seen, but this is only saving about 28 MiB by going from 36 MiB to 8 MiB.
i think most lemmy server operators will very much prefer performance over a few mb on the disk.
i see opt-level
is currently also set to optimize for size, maybe this should also be changed to optimize for performance instead? at least for release builds that seems to make more sense.
what's the reasoning behind going for size optimization instead of performance optimization currently?
I can't find the article anymore, but someone found that the rust optimizations that reduce binary size, also increase its performance. This writeup with tests seems to confirm it also, with all the profile.release
settings we're using, being the recommended ones.
I'm fairly sure upx is just a binary compressor, which inflates it when its run, so it shouldn't affect performance.
EDIT: I did find these two, which seem to say its a case-by-case issue:
codegen-units = 1
can slow down compilation, could you test how long a release build takes with and without that setting? Im also sceptical about upx because binary size is not really a problem, but slower startup or runtime performance would be problematic.
It should only affect release builds, so CI releases, but not development, or regular CI.
Seems to only be a 30s difference, so its probably worth it.
Type | Build time |
---|---|
Release (no codegen-units) | 3m 15s |
Release (codegen-units = 1) | 3m 45s |
Running the upx-zipped binary didn't seem to take much time at all, so its probably a good idea. And it shouldn't affect performance because it has to inflate it to run it anyway, no different from zipping a file.
Based on the links above, opt-level
is the main one we should be concerned with when it comes to performance, but unfortunately they say its a case-by-case issue which setting is best.
The codegen-units bit should positively impact performance, if not binary size, based on what the cargo book says:
More code generation units allows more of a crate to be processed in parallel possibly reducing compile time, but may produce slower code.
If the inverse of this statement is true, then fewer code generation units should result in faster code.
Okay I just did some more tests, messing with the opt-level, which by default is 3
on release, but we have ours optimized for size (z
) currently. Here are the results:
Type | Build time | Binary Size | Binary Size after upx |
---|---|---|---|
Release (opt-level=z) | 3m 45s | 38M | 8.7M |
Release (opt-level=3) | 7m 05s | 51M | 11M |
Based on their recommendations I'm make this change to opt-level = 3
here also, since that is supposed to be more performant. But it does make this caveat:
It is recommended to experiment with different levels to find the right balance for your project. There may be surprising results, such as level 3 being slower than 2, or the "s" and "z" levels not being necessarily smaller. You may also want to reevaluate your settings over time as newer versions of rustc change optimization behavior.
Still probably worth it though, especially since upx will make this smaller.
Does Lemmy not have performance benchmarks? Like (federation posts/comments/like, getPosts getComments endpoints, ... ops/sec) That way you would have a better insight in what the impact will be.
There are some in lemmy/scripts/query_testing
, but based on #4983 , which is an incredibly annoying blocker for the 0.19.6
release, we need to create unit tests for post_view that make sure any DB changes don't negatively affect the performance. I'll try to work on that this week.
Here's the difference:
Type | Build time | Binary Size | Binary Size after upx |
---|---|---|---|
Release (lto=thin) | 7m 05s | 51M | 11M |
Release (lto=fat) | 10m 57s | 46M | 11M |
From the docs:
true or "fat": Performs “fat” LTO which attempts to perform optimizations across all crates within the dependency graph. "thin": Performs “thin” LTO. This is similar to “fat”, but takes substantially less time to run while still achieving performance gains similar to “fat”.
Since we run release builds pretty rarely, I'd say its worth it.
Taking two recommendations from https://github.com/johnthagen/min-sized-rust. There are a lot of other recommendations there, but they all affect the behavior, so probably aren't worth messing with.
Using
codegen-units = 1
(Removes parallel code generation but makes binary smaller)Running upx, which reduced size by 23%.