Affect on the binary size #304

Open zeenix opened 1 year ago

In GitLab by @MaxVerevkin on Jan 6, 2023, 20:47

From one of my projects:

$ cargo bloat --crates --release
 File  .text     Size Crate
 7.9%  18.8%   1.8MiB std
 6.4%  15.2%   1.4MiB zbus
 4.0%   9.4% 898.1KiB zvariant
 3.3%   7.9% 754.3KiB i3status_rs
 3.0%   7.0% 670.3KiB serde
 2.1%   5.0% 480.7KiB reqwest
 1.8%   4.3% 408.7KiB tokio

In this example, zbus + zvariant are responsible for almost 25% of .text section size.

I thought this issue can be used as a place to collect info and ideas on how to improve this.

I thought this issue can be used as a place to collect info and ideas on how to improve this.

Thanks. I'm guessing it's because of all the generics we have. It makes for a lot of ease but if that's the case, perhaps we can feature gate them (at least the low-level MessageBuilder API).

Also, did you try any of these and if they help in regards to zbus' % of the pie?

Also, did you try any of these and if they help in regards to zbus' % of the pie?

I briefly tried with dbuz and most of them don't have any impact on the % of zbus, just much slimmer binary over all though. To be honest, with those opmtizations enabled, if zbus takes 7.4% total (16.7% of .text) at 462.8KiB, I think that's a pretty decent price for the ease of use brought in by the generics.

Having said that, I won't be against reducing the size or provide easy ways for users to reduce the size (e.g the feature gating idea I proposed above).

In GitLab by @MaxVerevkin on Jan 7, 2023, 23:50

Interesting.

I tried

[profile.release]
lto = "thin"
opt-level = "z"
codegen-units = 1
debug = 1

and zvariant went from 898.1K to 32.9K! Zbus went from 1.4M to 681.5K, which is also not bad.

I wonder if performance drop is noticeable. Will do some testing.

In GitLab by @ids1024 on Feb 22, 2023, 19:37

I wonder if there's any good way to specifically measure generic bloat. I think it should be possible to determine what mangled symbols refer to the same generic function, and see if any particular functions are duplicated a lot.

If it's possible to identify anything like that, it may be possible to use the trick of having a generic function that's just a thin wrapper around a non generic one. Of course that's only possible for certain uses of generics. Not sure how much code in zbus might be like that.

In GitLab by @federico on Feb 22, 2023, 20:51

I've been doing a little investigation in librsvg on the code size for its generics, with cargo bloat --release -n 0 --filter librsvg:

-n 0 - show all the functions, not just the top N ones
--filter librsvg just show my crate; I don't care about libstd/regex/etc. since they are not under my control.

And yeah, that shows "duplicated" functions for each monomorphization. I'm experimenting with moving some things to dynamic dispatch; librsvg can use a lot of that. For example:

0.0%  0.0%    825B librsvg librsvg::element::ElementInner<T>::set_style_attribute
0.0%  0.0%    825B librsvg librsvg::element::ElementInner<T>::set_style_attribute
0.0%  0.0%    825B librsvg librsvg::element::ElementInner<T>::set_style_attribute
0.0%  0.0%    825B librsvg librsvg::element::ElementInner<T>::set_style_attribute
0.0%  0.0%    825B librsvg librsvg::element::ElementInner<T>::set_style_attribute
0.0%  0.0%    825B librsvg librsvg::element::ElementInner<T>::set_style_attribute
[28 of these...]

For other types of generics, there's the momo crate to help with the pattern of writing an inner function that is called from the public functions that remove the generics. I haven't tried it yet, as I'm not sure yet if librsvg could use it.

For other types of generics, there's the momo crate to help with the pattern of writing an inner function that is called from the public functions that remove the generics. I haven't tried it yet, as I'm not sure yet if librsvg could use it.

hmm.. at first look, that does look like something that can help but it doesn't cover fallible conversions we have though. Still worth investigating.

I spent some hours today splitting generics-using methods but with each change cargo bloat claimed even more bloat (perhaps partly because cargo bloat isn't precise and partly because the real change will come with macros using the non-generic API?) and with the latest change (2658ee6cd888c2e8c337a4b39edaf5cc8406f4e7), I ended up with very difficult lifetime issue (could be a Rust bug actually).

I wonder if this is really worth pursuing further, given how @MaxVerevkin found out that size can be reduced by 97%.

Also how much can we really trust the output of cargo-bloat? :thinking:

In GitLab by @sdroege on Feb 26, 2023, 13:14

Is that with a proper application that actually uses zbus in a non-trivial way?

@sdroege you asking me or @MaxVerevkin ? As I wrote, I checked against dbuz.

In GitLab by @MaxVerevkin on Feb 27, 2023, 11:44

My comment above was based on i3status-rust, which I suppose uses zbus in a non-trivial way.

@sdroege oh and on Sat I was testing against our simple server example, which could be as trivial as it gets.

), I ended up with very difficult lifetime issue (could be a Rust bug actually).

Looks like this is a rustc bug. :disappointed:

In GitLab by @ids1024 on Feb 27, 2023, 16:33

Also how much can we really trust the output of cargo-bloat?

I think the main limitation would be inlining. Just looking at the symbol table of the binary it has no way to know which functions are inlined into which other functions or how much space they're taking up. Inlining can contribute to bloat, though it may also save space in other cases.

In GitLab by @mwcampbell on Mar 12, 2023, 21:59

I've started working on reducing zbus's compiled footprint. I'm doing my measurements with a minimal AccessKit example that I wrote in this branch:

https://github.com/AccessKit/accesskit/tree/zbus-binary-size-measurement

The headless example under platforms/unix in that branch runs the AccessKit AT-SPI implementation (using zbus), but no actual GUI. So that binary contains nothing but the usual overhead for a Rust binary, AccessKit, zbus, and zbus's dependencies.

My working branch of zbus is available in my GitHub fork:

https://github.com/mwcampbell/zbus/tree/size-opt

So far I've reduced the size of that AccessKit example by about 90 KB when compiling for x86-64 Linux, optimizing for size, and using panic = "abort". When using the default optimization level and panic behavior (unwind), my changes so far reduce the binary size by 190 KB.

@mwcampbell Nice! Thanks for working on this.

So far I've reduced the size of that AccessKit example by about 90 KB when compiling for x86-64 Linux,

That's pretty good. Just keep in mind that ideally we want to get the binary sized reduced w/o breaking any API.

and using panic = "abort". When using the default optimization level and panic behavior (unwind), my changes so far reduce the binary size by 190 KB.

What do you think of the flags @MaxVerevkin tried? Are any of those not acceptable for AcessKit/Bevy projects?

In GitLab by @mwcampbell on Mar 13, 2023, 01:41

With the latest commit on my working branch, I believe I'm no longer breaking the API.

The optimization flags you referenced are very close to the ones I'm using when optimizing for size. The reduction in the size of zvariant (as measured by cargo bloat) is impressive. Still, it seems to me that the total size of zbus and its dependencies is about 1 MB. So I want to see if we can shrink it even further.

With the latest commit on my working branch, I believe I'm no longer breaking the API.

Nice.

Still, it seems to me that the total size of zbus and its dependencies is about 1 MB. So I want to see if we can shrink it even further.

Oh? What was the size before those flags?

In GitLab by @mwcampbell on Mar 13, 2023, 15:26

When compiling the AccessKit Unix headless example (on the zbus-binary-size-measurement branch) with the optimization flags set in that branch (codegen-units = 1, lto = true, opt-level = "z", and panic = "abort"), the stripped binary size is 1403304 bytes. When compiling with the same flags except with the opt-level flag removed (i.e. default optimization for speed), the stripped binary size is 2439592 bytes.

Since Bevy is a game engine, I'm guessing there are parts of it where optimizing for speed makes a difference, so compiling the whole binary with opt-level = "z" would be unacceptable. IIUC one can specify different optimization levels for different crates (though I don't know how that works once you add LTO), but only in the top-level workspace. The Bevy project leaders would probably not be happy with requiring every Bevy-based application to use a complex Cargo profile configuration (with different optimization levels for different crates) to get a good binary size.

So I've been focusing in the wrong area so far. I need to focus on restructuring code in zvariant so that the heavy inlining that we get when optimizing for speed doesn't have such a large effect.

When compiling the AccessKit Unix headless example (on the zbus-binary-size-measurement branch) with the optimization flags set in that branch (codegen-units = 1, lto = true, opt-level = "z", and panic = "abort"), the stripped binary size is 1403304 bytes. When compiling with the same flags except with the opt-level flag removed (i.e. default optimization for speed), the stripped binary size is 2439592 bytes.

Interesting. Thanks for sharing the details.

Since Bevy is a game engine, I'm guessing there are parts of it where optimizing for speed makes a difference, so compiling the whole binary with opt-level = "z" would be unacceptable.

Hmm.. isn't Bevy a library? So the optimizations will have to be decided by the user, no? Bevy docs can recommend two options, one for cpu and another for memory optimization?

So I've been focusing in the wrong area so far. I need to focus on restructuring code in zvariant so that the heavy inlining that we get when optimizing for speed doesn't have such a large effect.

Right. What does the output of cargo bloat look like for your example btw? I also noticed that zvariant is one of the big culprits here. Generics are the power of serde-based-API so I hope you can find way to optimize w/o sacrificing on generics there.

BTW, you keep ending up starting new threads. I know gitlab UI is confusing so I don't blame you. Just FYI for future reference. :)

In GitLab by @mwcampbell on Mar 13, 2023, 18:39

I couldn't come up with any quick wins this morning. I wonder if, as @federico is doing in librsvg, we need to introduce some dynamic dispatch to prevent monomorphization from exploding the code size. Maybe something similar to erased-serde.

I couldn't come up with any quick wins this morning.

:(

I wonder if, as @federico is doing in librsvg, we need to introduce some dynamic dispatch to prevent monomorphization from exploding the code size.

Well, that was the goal of my WIP branch. However, I think I went a bit in the wrong direction by splitting all generics I could find, instead of ones that have significant code in it to make a significant difference. Also, as I wrote, I also ran into a Rust issue. :(

Maybe something similar to erased-serde.

I'm not sure I follow. That crate is doing the other way around: Wrapping generic API into non-generic one.

In GitLab by @mwcampbell on Mar 14, 2023, 16:24

Yeah, I think you're right about erased-serde.

Maybe it would be better not to use serde at all. I listened to a couple of your talks on zbus, and I know that you started out not using serde, then decided to use it. But I personally wouldn't mind if zbus went back to using a custom serialization framework, designed to use dynamic dispatch.

Maybe it would be better not to use serde at all. I listened to a couple of your talks on zbus, and I know that you started out not using serde, then decided to use it. But I personally wouldn't mind if zbus went back to using a custom serialization framework, designed to use dynamic dispatch.

That would be a huge API break. It won't even be zbus anymore. I don't think this is a huge enough problem to consider that (even a Bevy dev said that it's not that big a deal) but even if it was, we haven't really tried hard enough yet.

You didn't address this btw:

Since Bevy is a game engine, I'm guessing there are parts of it where optimizing for speed makes a difference, so compiling the whole binary with opt-level = "z" would be unacceptable.

Hmm.. isn't Bevy a library? So the optimizations will have to be decided by the user, no? Bevy docs can recommend two options, one for cpu and another for memory optimization?

Also the fact that bevy folks don't have a specific enough criteria, says a lot I think.

Anyway, let's try and do what Alice says and see if we can reduce the binary size somehow. If we can, great. If we can't, well consider it a price of a nice D-Bus API.

Today I tried to use git version of zbus in i3status-rs to get an idea of how many breaking changes there are (not a lot :) ) and the binary increased in size by ~2M (both stripped and not).

the binary increased in size by ~2M (both stripped and not)

I know. :( I am hoping #493 will help here.

So I sat down with @sdroege at the GNOME Rust hackfest yesterday and looked at what could possibly be improved, while looking at the output of cargo bloat in busd (as an example). Here are some thoughts from that:

828
829
331: An already known issue that's been worked on in #712). This will not help most people in the end but at least zbus-based libraries can then also provide this feature and make their binary size look smaller (it's mostly about the show anyways tbh).
Some invisible closures that cargo bloat talks about in various places. :shrug:

Remove the closure use.

It's not the use of closures, but you somewhere have closures that end up as a lot of code. You'll have to find which closures these are and then check why they're so big. If you'd just get rid of the closures and put the contained code elsewhere it wouldn't change anything at all.

Another thing we talked about is to make use of trait objects in places where it's possible and makes sense to get rid of multiple versions of the same code with different generic parameters.

dbus2 / zbus

Affect on the binary size #304

828

829

331: An already known issue that's been worked on in #712). This will not help most people in the end but at least zbus-based libraries can then also provide this feature and make their binary size look smaller (it's mostly about the show anyways tbh).