Open smklein opened 6 years ago
Re-energizing this topic. Not necessarily for action prior to 3.0 release, but something to keep in our mind and begin to tackle as available.
My comment from #1891
Along those same lines but also a separate issue entirely, at some point I would like to have the discussion about aggressive cargo feature use. Ideally, all non-essential sections of clap could be disabled (but enabled by default) to allow small binary sizes when desired. I've had quite a few people / companies approach me about clap's binary size being their only issue. In fact, I believe that is the number one factor that led Google to work on argh, was binary size is their top concern (for Fuschia) and don't need a lot of the "extra" features that clap provides.
@CreepySkeleton's comment from that same issue:
On the other hand, people from this anti-dependency camp likely see clap as bloat-full and having too much allegedly unneeded deps irrespective of term_size. Clap already pulls 7 crates in --no-default-features mode, 22 in default-features mode, and 27 in all-features mode; one extra crate - very tiny and very useful btw - won't change their opinion. It's kind of all-or-nothing situation, 8 deps (including transitive) kind of count as "as small as possible". I really doubt any of them would become our user if we feature-gate one or two tiny crates.
That's not to mention that clap itself takes as much time to build as (almost) all of the deps combined, and the compilation time is a much more common complaint among users (in my view). Along with code bloat, which would only decrease without this feature.
Regarding modularization of clap.
I've been thinking about it too, for some time, and I've come to conclusion that, while it's doable - to some extent, it would require full-scale refactoring. And by full-scale I mean "rewrite the parser and the validator entirely" which is quite an endeavor.
Finally, @Dylan-DPC's comment from that issue:
Given how popular clap is, the number of dependencies is definitely a huge concern. I've been planing of tackling a lighter clap in some form or the other post the 3.0 release. That discussion is beyond the scope of this issue.
Just so my thoughts are known, I don't believe we need to remove all dependencies and features for the smallest possible size. What I do think is that we should know which dependencies have a low "weight to functionality" ratio.
"Weight" can assessed in two manners, code size added to clap, and compile time.
"Functionality" should be self evident.
I am definitely not in the camp of "make sure there are as few deps as possible." However, I am in the camp of I don't want to take on a dep that doesn't provide intrinsic value, or is a maintenance burden.
The harder crates to evaluate will be ones that don't add a lot to clap functionally, but at the same time don't add much if any compile time or code size either. I would propose for those style crates we look at either pulling the code internal, or dropping the crate.
I'm most concerned with the --no-default-features
case, that case should be as small as possible, with as few high ratio deps as possible. I'm not concerned with the default features, or even --all-features
cases, because at that point the consumer as chosen to go a heavier weight route, while there are lighter weight options.
This would be a great first issue, as it's not a lot of code changing, and doesn't require too much in depth knowledge of clap. At first, it will primarily be measuring the default crates, or seeing how widespread their use inside clap is.
Taking a quick look at our default crates I see:
Crate (+ transitive deps) | Functionality | Notes |
---|---|---|
bitflags |
Underpin of AppSettings |
|
unicode-width |
Heavily used in Help Messages | If we made auto generated help optional, this could become an optional feature |
textwrap |
Used for Help Messages | same as unicode-width |
indexmap (autocfg ) |
used in ArgMatcher /ArgMatches |
Perhaps we could use vec_map instead |
os_str_bytes |
used in parsing | clap fundamentally messes with OsStrs or bytes heavily...so this is probably well worth the dep |
vec_map |
Used throughout Arg and usage parser |
Used to be an optional feature. Has perf gains over BTreeMap...maybe those gains need to be evaluated again to see if they're worth it |
Doing a --no-default-feautres --features std
build on a simple:
App::new("foo").get_matches();
Gives us this output via cargo bloat
in release mode:
cargo bloat --crates --release
[..]
Analyzing target/release/fake
File .text Size Crate
9.3% 56.5% 285.5KiB clap
4.8% 29.3% 148.1KiB std
1.9% 11.7% 59.2KiB [Unknown]
0.1% 0.5% 2.6KiB indexmap
0.1% 0.3% 1.6KiB textwrap
0.0% 0.1% 307B fake
0.0% 0.0% 41B os_str_bytes
16.5% 100.0% 505.5KiB .text section size, the file size is 3.0MiB
Note: numbers above are a result of guesswork. They are not 100% correct and never will be.
Discounting clap and std, it looks like textwrap
and indexmap
need a close eye. If either has better alternatives, or is simple enough to not use/pull internal I'd say those may be on the chopping block.
Also, if we did something drastic like make auto help messages optional, that alone would cut a huge portion of the code along with size. Additionally and oddly enough, that may the easiest thing to modularize first.
"But wait!" I can hear you say. "Auto help generation is like THE thing clap is used for, right?" Sure. It's one of the things, but I'd argue not the most important at all. In many of the contexts where code size matters the most, help generation is one the things they don't care about because it's super easy to just write a help string manually for a small CLI.
Before we more any further, would you please test it in release mode 😄? Nobody cares about debug binaries, release the ones of concern here.
Literally just edited the comment with that info :laughing:
Did you enable lto
? I think we should assume with that.
main.rs
use clap::*;
fn main() {
App::new("app").get_matches();
}
With lto enabled (release build, no default features)
CreepySkeleton@builder:~/workspace/probe$ cargo bloat --crates --release
[...]
File .text Size Crate
19.4% 60.3% 277.7KiB clap
9.0% 27.9% 128.5KiB std
3.0% 9.2% 42.3KiB [Unknown]
0.3% 1.0% 4.7KiB probe
0.2% 0.6% 2.5KiB indexmap
0.0% 0.1% 394B textwrap
32.2% 100.0% 460.3KiB .text section size, the file size is 1.4MiB
What interesting here is that the size of the main.rs
crate increased from 307B
to 4.7KiB
. Basically, some code from std has been inlined and this is the biggest difference.
The overall difference is less than 10Kb (clap and its deps only) and it will thin as more features are actually used. All the further runs will be performed without LTO.
See also https://stackoverflow.com/a/52297790. TL;DR: LTO is not about reducing code size. z
opt level is about it.
Another interesting read: https://internals.rust-lang.org/t/rust-staticlibs-and-optimizing-for-size/5746. No TL;DR here, go read, it's worthy.
Kevin, just to make sure we're on the same page: I wasn't counting you as a no-deps guy, I was just thinking aloud "how would we sell clap to those guys" and I'm pretty sure the answer is - we wouldn't.
If their wariness comes from concern that the deps affect binary size, we can righteously tell them that all the deps put together (in --no-default-features
mode) take less than 5KiB, the next rustc release will likely change much more of that. The same can be said about compilation time: ~2sec difference, essentially a rounding error.
At the very least, if 5KiB do mater to you, you're working on embedded, and clap will likely never suit you. Take a look at getopts/argh or write it yourself.
I agree regarding help messages. It also drags error messages along.
What not clear to me is your logic about "either pulling the code internal, or dropping the crate". It sounds like "Let's copy-paste code from third party crates (we can't just drop them) in order to decrease our maintenance burden". But from that point on, we will be the ones who maintain this copy-pasted code. A contradiction. Could you please elaborate on that?
At the very least, if 5KiB do mater to you, you're working on embedded, and clap will likely never suit you. Take a look at getopts/argh or write it yourself.
I don't think we should be saying that. We should be using cargo feature flags to make sure all the fields can use clap.
To make it work on embedded, you'll need do be alloc free (usable wIthout, String, Vec and such kind of collection).
There's a post 3.0 plan for that already which is why added the no_std
feature
To clarify my statement about embedded:
no_std
, alloc
, and accurate fine tuning (see modularization) will be required.To summarize all of that, you either have an extra hundred of KiBs at your disposal (and this is why I think we shouldn't optimize every single KiB) or clap is not for you.
And a bit of personal opinion: as someone with a (very limited) experience with embedded, I can say that allocations are almost tabu in there. Given that clap relies on Vec
and stuff very heavily... I won't even try it if I need a CLI parser for embed. I'll go grab getopts.
Even getopts requires Vec
and String
. I personally don't see why I would need clap in embedded, that's a very niche usecase.
(embedded = bare metal on microcontroller)
A lot to reply to here :) I don't think we need to be concerned with embedded (bare metal) right now. Maybe one day, but there are many things we can do that will improve clap before we worry about embedded.
I'm more concerned with embedded like, i.e. resource constrained devices, like small arm based (Pi, SBC, phone, etc.). I spoke with some Google employees who were looking at clap for that exact use case (presumably Fuschia), and binary size was their concern. Binary size isn't everyone's concern. To be clear, I'm not trying to supplant argh
, but I do think we should be cognizant of that use case while building clap.
We're in a little bit of a catch-22 scenario. Many people assume argument parsing is such simple task that the deps should be ultra small and no transitive deps. That's understandable why one would think that. However, as all of us know, it turns out argument parsing is somewhat of a hidden beast. There are edge cases EVERYWHERE, and a near endless amount of features to implement.
It turns out argument parsing is complex.
Having said that, the issue we're discussing conflates a few concerns that we should probably be more explicit about:
Both binary size and compile times are affected by:
While adoption and perception can be heavily influenced by binary size, compile times, and the dep graph as it's own construct.
So long as we're doing our due diligence when it comes to crates we take on as dependencies to ensure they're worth the price we pay in terms of binary size and compile times, we're good. Alternatively, if they negatively affect those areas, we should do our best to ensure they're optional.
Meanwhile, we should be thinking about the dep graph as it's own construct because right, wrong, or indifferent people do care about how many crates they pull in. Unfortunately, everyone's reason for caring can be different, for some it's binary size, for some it's compile times, for others it's the perception of risk (what happens if a crates has a security flaw, or goes away over night?).
We're addressing the first two concerns already, but the third (perception of risk) is primarily addressed by either using ultra popular, well supported crates, or crates related to/owned by the parent (which is usually signalled by the name, i.e. clap_*
for us).
"how would we sell clap to those guys" and I'm pretty sure the answer is - we wouldn't.
You're correct in that part of the answer is, "we don't." But I think we mean different things by this. I mean we "We don't try to sell them on clap, we just do the best we can to care about that scenario."
We can do things like making clap modular enough, or structured in a way to turn off features the end user doesn't care about. This has an added benefit of potentially being able to make most of our deps optional.
And where they can't be option, oh well. At least we're trying. clap isn't going to fit every situation and that's fine. But we should at least do what we can to be an option for even those resource constrained situations.
If their wariness comes from concern that the deps affect binary size, we can righteously tell them that all the deps put together (in --no-default-features mode) take less than 5KiB, the next rustc release will likely change much more of that. The same can be said about compilation time: ~2sec difference, essentially a rounding error.
I totally agree with you. However, the problem is many times we don't get the chance to explain ourselves. They see the dep graph front-and-center and many times simply make assumptions from there. That's the part about the "no dep" crowd I'm not a fan of. I can't count the number of reddit comments, or blog posts I've seen that say, "Can you believe [program X] pulled in 170 crates?! That's absurd!!!" with no backing evidence as to why that's absurd. Were those 170 crates not required? How much compile time did they add, etc.? If the compile times were the EXACT same, but [program X] just copy/pasted the code from those crates, you probably wouldn't hear a peep. In fact, you'd probably hear how amazing it was that [program X] has no dependencies!
...I'm not a fan of that. But it's the world we live in.
What not clear to me is your logic about "either pulling the code internal, or dropping the crate". It sounds like "Let's copy-paste code from third party crates (we can't just drop them) in order to decrease our maintenance burden"
I should have been more clear :stuck_out_tongue_winking_eye: Decreasing our maint burden and decreasing our dep graph are separate concerns that get intertwined. Also taking the code internal is only in the case where we're using a dep for a small well defined subset of what the crate offers, and we aren't concerned with maint burden part, just the raw number of deps part. This should only be a last resort for us.
This is one of those things that's counter intuitive; copying and pasting a subset of a crate into clap directly, (or even into a new sub crate like clap_foo
may actually not make any difference in compile times or binary size. But it will still make many people "feel" better about using clap.
Is it dumb. Yep. But again, this should be a last resort for us and not something I'm actually looking at doing seriously. The only crate that comes to mind for this type of thing is term_size
which is technically already a clap owned crate, but again the name doesn't signal that so when people compile clap they just assume it's some rando crate we pulled in as a dep. Am I suggesting re-naming term_size
to clap_foo
? Nope. Not unless we were in drastic territory and have exhausted all other options. We're nowhere near that! :smile:
There are some good arguments regarding inclusion of deps in this reddit thread today, https://www.reddit.com/r/rust/comments/gi7v2v/is_it_wrong_of_me_to_think_that_rust_crates_have/
@kbknapp invited me to participate in this issue, but I have to say, from reading the comments, it seems like there's an undercurrent of counter-productive "us vs them." If I'm coming from (your perspective) the "anti-dependency" camp and you already believe that you can't sell clap to me
Kevin, just to make sure we're on the same page: I wasn't counting you as a no-deps guy, I was just thinking aloud "how would we sell clap to those guys" and I'm pretty sure the answer is - we wouldn't.
then is it really worth my time to participate here if I'm treated as a lost cause in the first place?
reddit and low effort comments are a pit, and people love to make sweeping generalizations that makes the world look more extreme than it is. In reality, I think most people exercise good judgment and try to straddle this line as best as they can. Sometimes I'm on the "anti-dependency" side of the fence and take steps to reduce my dependency tree. But other time, I'm on the "yeah dependencies are great" side and try to resist the urge to remove dependencies.
I think @kbknapp has the right attitude here. The struggle is the important part. Doing due diligence on dependencies before bringing them in, and really weighing their costs and benefits, is what's important. And to me, that's what I see myself doing when I notice that a new dependency is about to come into my tree. What I see is a prior state in which I could opt out, but now I can't, and I don't know why. The code that was deleted looked pretty small, its commit history suggests it was very lightly edited and its performance was more than good enough for me. Moreover, when I first dropped the vec-map
dependency from my use of Clap, I noticed some really crazy compile time improvements. I suspect there was a bug somewhere or some weird case, but still, the flexibility to drop dependencies was really nice.
I'm not familiar with Clap internals, and as someone who has written more than one argument parsing library, I am definitely someone who appreciates their complexity. I never take a good argument parser for granted. They are super hard to build because they need to balance so many competing concerns. IMO, Clap does a great job of this already.
So with that said, I'll give some opinions on Clap 3's required dependencies:
bitflags
- I would personally just hand write any bit flags logic that I needed. In my view, this is a crate that optimizes for writing code, which I generally think isn't the right thing to be doing in widely used fundamental crates. So there's a bit of a philosophical opinion involved here.textwrap
- I would make this optional but enabled by default. I write all of my help strings and documentation wrapped to 79 columns anyway, which is probably good enough.unicode-width
- I would make this optional but enabled by default. If my help text is all ASCII, then an assumption that each byte is one column is reasonable.indexmap
- I don't have enough context to know what to do here, but it looks like it was recently added. I assume this is justified somehow through a major performance win, but otherwise, I would drop this.os_str_bytes
- This is definitely better than using unsafe code. If I knew more about why Clap needed this, then perhaps there is a way to get rid of this too. But yeah, if Clap needs to be mucking around with OsStrs, then the lack of an OsStr
string API probably makes this really hard to the point where you pretty much have to do what os_str_bytes
is doing. (I still wish std would just expose its internal WTF-8 representation, but there are good arguments against doing that.)vec_map
- I don't know the performance story here, but keeping it optional looked easy? But maybe there are some issues I don't know about.Thank you for the detailed response! Before I can finish reading the comment and add any thoughts, let me address something real quick as a moderation note:
from reading the comments, it seems like there's an undercurrent of counter-productive "us vs them." If I'm coming from (your perspective) the "anti-dependency" camp and you already believe that you can't sell clap to me
I want to be very clear with everyone reading these messages (especially including the clap team) that this is not what I want to portray or the tone I want to encourage. Internet communications via text are hard, especially when we start considering native languages and such. (Edit: to be even more clear. I don't think @BurntSushi is portraying this tone. He's reading and noticing this tone from previous comments we made. And I should say thank you for pointing this out! :smile: )
I just want to be clear that the clap team is on the same page of, there is no "us vs them" and all opinions are welcome! We have differences of priorities and how to achieve those priorities, but all opinions are valid within this discussion. Ultimately the clap team will have to make a decision on which priority and implementation strategy to go with, but through discussions and issues like these my goal is to take all opinions in, consider them carefully, and make a decision that is best for clap and all consumers (current and future).
...now back to reading :smile:
Also, one other note: IIRC, cargo bloat
has an option to list binary size by function. That might also be useful to look at for decreasing binary size. Sometimes you can get big wins by tweaking the inlining strategy used, for example. Or using trait objects is sometimes possible, although I've found it difficult to do in practice in many cases because of object safety rules.
I just want to be clear that the clap team is on the same page of, there is no "us vs them" and all opinions are welcome!
I strongly agree with this.
Unfortunately, I only started using Rust this year (maybe Dec last year), so I missed all the deps related discussions and am still trying to catch up on understanding the pros and cons related to rust ecosystem. That is why I was asking you the recent question on deps in termcolor @BurntSushi.
In Javascript before, while I do try to reduce our usage of deps, I still try to prefer commonly used packages and using them as deps instead of writing all of them myself which was the convention all across the ecosystem.
If we take the case of vec_map
, my question is why do we want to make it optional? Shouldn't it be either including it completely or removing it completely? I think this is basic question we want to answer first.
If we take the case of vec_map, my question is why do we want to make it optional? Shouldn't it be either including it completely or removing it completely? I think this is basic question we want to answer first.
My understanding is there are several concerns that can also overlap with each other, and not everyone has the same concerns. As mentioned above, some of those include:
Adding a dep can have negative impacts on points 1 and 2, or both. Some people care about point 1, some people about point 2, and some about both. So if we consider either point a deps "cost" we have to weigh that cost against it's benefits (usually performance, or lower maintenance burden for us directly).
This is what I was trying to say earlier, we have to weigh any cost (including the subjective ones) against the objective benefit.
The reason for making it optional and not a binary include or not include is because people have various requirements. Some may not care about the binary size, or increased compile time if the feature provided is worth the cost, but since some do care we need a way to allow our consumers the same choices we're making at the library level.
For vec_map
which is a self contained crate and no transitive deps, I would want to look at exactly what perf gains we get, compared with binary size/compile times. Add to that, since we pulled in IndexMap
, perhaps we can use that instead and remove one or the other entirely? I haven't looked lately, so that may not be feasible.
To address the comments per dep from @BurntSushi :
bitflags
- I hadn't really considered just pulling in the logic of bitflags to be honest. ...but now that you say it I'm sure it's actually simple subset since we don't particularly care about the macros, etc. just the functionality. Would probably worth a look at least.
textwrap
- the hard part with this one is this dep helps us wrap at word boundaries intelligently. But perhaps making it optional for those that will manually wrap their help strings anyways.unicode-width
- I also like the idea of providing a, "I only use ASCII args/help strings (values excluded), so turn off all extra deps related to handling Unicode logic" flagindexmap
- Might be able to combine with vec_map, or relook at our exact use case. I need to familiarize myself with the code again.os_str_bytes
- This is probably staying for the time being. Clap is pretty much written with idea we use OsStr(ing) internally everywhere, and only assume/use UTF-8 at the consumer demarcation point. I've toyed with the idea of bstr
since all we really care about is bytes, but not done enough research to see what overlap, if any, bstr
and os_str_bytes
have or which would be the best fit.vec_map
- Addressed in other comments. I just want to make sure we're getting a good cost:benefit ratio otherwise we may drop it entirely.I've toyed with the idea of bstr since all we really care about is bytes, but not done enough research to see what overlap, if any, bstr and os_str_bytes have or which would be the best fit.
You probably want to stick with os_str_bytes
. os_str_bytes
re-implements WTF-8 and gives you 100% correctness on both Unix and Windows. The only help bstr
gives you with OS strings are utility functions for converting between &OsStr
and &[u8]
, which is zero cost on Unix but does lossy decoding on Windows, so it sacrifices some correctness. (bstr
does not re-implement WTF-8.)
I don't know how hard it would be to avoid doing the equivalent of os_str_bytes
while maintaining correctness. It could be impossible. I could imagine some possibilities, like lossily decoding each argument for parsing only, which would work since the "significant" characters in the arg parser I believe are all ASCII. But then making sure you hand back the original OsString
to the caller. But this is complex and could potentially be a performance regression. Unless std decides to change its tune and expose an OS string's encoding, os_str_bytes
is probably the right answer.
I think using indexmap
in place of vec_map
would make the performance worse. And we need indexmap
for ordered map IIRC.
There was a PR in the v2 commit history which removed bitflags
but then you reverted it back @kbknapp. Not sure what happened but it was around 2 yrs ago.
I think using
indexmap
in place ofvec_map
would make the performance worse. And we needindexmap
for ordered map IIRC.
That's entirely possible. I haven't done any investigating on the matter yet.
There was a PR in the v2 commit history which removed
bitflags
but then you reverted it back @kbknapp. Not sure what happened but it was around 2 yrs ago.
That was switching bitflags
for a HashSet
which had a significant negative performance impact which was the reasons for the revert.
Also, one other note: IIRC, cargo bloat has an option to list binary size by function. That might also be useful to look at for decreasing binary size. Sometimes you can get big wins by tweaking the inlining strategy used, for example. Or using trait objects is sometimes possible, although I've found it difficult to do in practice in many cases because of object safety rules.
I actually looked at the list long time ago and found that
- textwrap - I would make this optional but enabled by default. I write all of my help strings and documentation wrapped to 79 columns anyway, which is probably good enough.
- unicode-width - I would make this optional but enabled by default. If my help text is all ASCII, then an assumption that each byte is one column is reasonable.
Agreed.
- bitflags - I would personally just hand write any bit flags logic that I needed. In my view, this is a crate that optimizes for writing code, which I generally think isn't the right thing to be doing in widely used fundamental crates. So there's a bit of a philosophical opinion involved here.
Counterpoints:
If you're suggesting to use a set of raw u64
constants, like
const REQUIRED: u64 = 1;
const MULTIPLE_OCC: u64 = 1 << 1;
const EMPTY_VALS: u64 = 1 << 2 | TAKES_VAL;
const GLOBAL: u64 = 1 << 3;
const HIDDEN: u64 = 1 << 4;
const TAKES_VAL: u64 = 1 << 5;
const USE_DELIM: u64 = 1 << 6;
Then there's a very significant downside to it: you can accidentally OR it with something that is not a bitflag, especially if names are similar and the types happen to match.
bitflags
prevents this by creating a separate type for it that is incompatible with everything else. Of course, you could create a separate type yourself, by hand, but then again, you'll be left to implementing OR, And, Not...
by hand - and you better make sure you aren't making more errors when you'll be copy-pasting the impls!
I don't really have very serious objections to importing the macro into clap codbase, but I'd rather avoid it nonetheless ("reinventing the wheel" argument).
As an argument for importing the macro, I'd say that bitflags
looks mostly unmaintained at this point (four months without updates and a couple of unanswered PRs).
- indexmap - I don't have enough context to know what to do here, but it looks like it was recently added. I assume this is justified somehow through a major performance win, but otherwise, I would drop this.
This is justified by the need of a specific data structure, and IndexMap
satisfies it. We need:
O(1)
lookup. Probably even O(log(n))
would be fine, number of arguments (n
) rarely exceeds a hundred or so. Both HashMap
and BTreeMap
could be used here. HashMap
nor BtreeMap
fit.Yes, we could use a HashMap
coupled with a Vec
to trace the insertions, but why reimplement the wheel?
- os_str_bytes - This is definitely better than using unsafe code. If I knew more about why Clap needed this, then perhaps there is a way to get rid of this too. But yeah, if Clap needs to be mucking around with OsStrs, then the lack of an OsStr string API probably makes this really hard to the point where you pretty much have to do what os_str_bytes is doing. (I still wish std would just expose its internal WTF-8 representation, but there are good arguments against doing that.)
Well, either std decides to expose the wtf-8
representation - which is very unlikely - or we do what os_str_bytes
does simply because we need these methods. And I'd rather use one, well-audited implementation (os_str_bytes
appears to be one) instead of spawning our own unicorn impl with lots of unsafe.
This is core functionality. It breaks, we break. It shines, we shine, too.
bstr
is great, but not quite what we need.
- vec_map - I don't know the performance story here, but keeping it optional looked easy? But maybe there are some issues I don't know about.
- vec_map - Addressed in other comments. I just want to make sure we're getting a good cost:benefit ratio otherwise we may drop it entirely.
90% sure we can simply replace it with BTreeMap
or even plain Vec
. I'll be looking into it.
@kbknapp
"But wait!" I can hear you say. "Auto help generation is like THE thing clap is used for, right?" Sure. It's one of the things, but I'd argue not the most important at all. In many of the contexts where code size matters the most, help generation is one the things they don't care about because it's super easy to just write a help string manually for a small CLI.
Would also exposing static help generation via a build script and/or macro be feasible?
"But wait!" I can hear you say. "Auto help generation is like THE thing clap is used for, right?" Sure. It's one of the things, but I'd argue not the most important at all. In many of the contexts where code size matters the most, help generation is one the things they don't care about because it's super easy to just write a help string manually for a small CLI.
I'd disagree with that perception. For writing "strongly-typed scripts" that are meant to be tiny, I'd be willing to sacrifice functionality if the gains are proportionate to what's lost, and there do even exist argument parsers designed to be tiny that do --help
generation in about 1/20th the size of clap. (pico-args has a comparison chart and, of them, gumdrop and argh will do --help
generation.)
...the deal-breaker is that, last I checked, none of the tiny ones that do --help
generation support taking OsString
arguments, and that's non-negotiable. (I actually have some mojibake'd filenames on my drives right now.)
That said, I do consider --help
generation non-negotiable, because I'm big on DRY in my code and, if the parser didn't have --help
generation, I'd have to write a procedural macro or some other form of compile-time metaprogramming wheel-reinvention to ensure the --help
and the parser can't drift out of sync.
Just randomly stumbled across this ticket and I do find one argument rather weird in this discussion which is "performance": How is that even a topic worth making size tradeoffs for?
In the end, a command line parser (typically) runs once per application invocation while bloat is evident even when the application is not used at all.
For this kind of crate size should always have a way higher priority than runtime performance.
@therealprof No, performance matters, a lot. This is a problem docopt
had (and probably still has), where in certain configurations, its performance could be quite bad because of how its matching algorithm worked.
The main reason why performance matters for CLI parsers is when a lot of arguments are passed to the command, or when the command is repeatedly executed. This can happen pretty easily with things like xargs
.
The main reason why performance matters for CLI parsers is when a lot of arguments are passed to the command, or when the command is repeatedly executed. This can happen pretty easily with things like
xargs
.
Sure, I wasn't implying that abysmal performance was acceptable. But you don't need superfancy order-preserving hashmaps to get perfectly fine CLI parsing speeds for 99+% of all applications, a regular Vec
would do just fine. And for the rest we can always have a separate opt-in performance option which uses highly optimized external crates to cater for crazy fast 2 MB CLI arguments parsing needs.
I see, I didn't realize you were talking at that level of detail. I think at that point, you need a benchmark. I do share your skepticism on that specific point, but would definitely want to subject it to testing.
- indexmap - I don't have enough context to know what to do here, but it looks like it was recently added. I assume this is justified somehow through a major performance win, but otherwise, I would drop this.
This is justified by the need of a specific data structure, and
IndexMap
satisfies it. We need:
- A key-value-like representation with
O(1)
lookup. Probably evenO(log(n))
would be fine, number of arguments (n
) rarely exceeds a hundred or so. BothHashMap
andBTreeMap
could be used here.- Iteration order must match the insertion order. Neither
HashMap
norBtreeMap
fit.Yes, we could use a
HashMap
coupled with aVec
to trace the insertions, but why reimplement the wheel?
A potential reason to "reimplement the wheel" is that indexmap
depends on hashbrown
which is duplicative of the HashMap
in std
.
A potential reason to "reimplement the wheel" is that indexmap depends on hashbrown which is duplicative of the HashMap in std.
This is a good point, thank you. I will get rid of it, but it won't shrink the size very much (30 Kb
at best).
Incidentally, I've discovered the true reason clap looks so bloaty. I'll follow up with an elaborative commentary a bit later, but for now, spoiler: how many of clap's features you think you really use?
To me 30kb (despite at best), look like a significant decrease given that clap usually adds 300kb to the binary. Thus I would certainly be looking forward to this. Also I wonder how that would affect compile times - maybe code size does not linearly correspond to compile times, so 'these' 30kb might take longer to produce than 'other' 30kb elsewhere.
In any case I will be looking forward to the more elaborate version, it will be interesting to see where all the perceived 'bloat' is coming from exactly.
@CreepySkeleton soft ping for that comment about the binary size :-).
Hi all!
textwrap
- the hard part with this one is this dep helps us wrap at word boundaries intelligently. But perhaps making it optional for those that will manually wrap their help strings anyways.unicode-width
- I also like the idea of providing a, "I only use ASCII args/help strings (values excluded), so turn off all extra deps related to handling Unicode logic" flag
I'll be happy to make Unicode handling an optional feature in textwrap
if there is a desire for that?
I've just merged a PR to textwrap
which makes it handle text in narrow columns better. That is, it will now attempt to minimize the "raggedness" of the wrapped lines and give you better output out of the box. Pleaes see the PR for an example. The feature comes with dependency on a new crate of mine... If that's a show-stopper, then please let me know!
I just wanted to say that I've released a version of textwrap where all dependencies are optional :tada: This means that both unicode-width and smawk can be disabled at compile time: https://github.com/mgeisler/textwrap/releases/tag/0.13.2
Some updates on this
vec_map
, most of the uses of it were treating it like a regular Vec
, see https://github.com/clap-rs/clap/pull/2774IndexMap
by baking in DeriveDisplayOrder
while still providing ways for getting sorted ordertext-wrap/unicode-width
and the new unicase
dependency are on by default and https://github.com/clap-rs/clap/discussions/2839 explores turning them off by default. It sounds like we shouldn't but thought I'd still share that developmentrequires
rules and default_value
rules were implemented as objects that can be passed into clap? At that point, its a pretty simple dead code elimination optimization done by the linker. We wouldn't need granular feature flags through the guts of clap to get rid of code.To re-cap, strategies for reducing code size
indexmap
Error
still has data public, so we can't completely shrink it yetHi all,
I got curious about the size of textwrap due to the discussion here. So I put together a quick-and-dirty wrapping algorithm and created a trivial program which wraps text given on the command line: https://github.com/mgeisler/textwrap/tree/master/examples/binary-sizes
The simply wrapping function only works on the happy-path: all ASCII, all words are short enough to fit on a line, there is a single ' '
between words, etc. I compared the size of the simple function with textwrap and its various Cargo features:
Configuration | Binary Size | Delta |
---|---|---|
quick-and-dirty implementation | 252 KB | — KB |
textwrap without default features | 268 KB | 16 KB |
textwrap with smawk | 284 KB | 32 KB |
textwrap with unicode-width | 276 KB | 24 KB |
textwrap with unicode-linebreak | 362 KB | 110 KB |
I hope these numbers help bring some data to the table. The features are described here in case you're curious.
I would love to hear from you if you have ideas for shrinking textwrap further :-)
Thanks for doing this!
If these are raw binary sizes then I'm assuming they include debug information which explains why they are so large. Running size
on the binaries will show the size of the different sections, including .text
and the various debug sections.
To me, it doesn't look like we saw enough gains to justify having two implementations, one for ascii and one for unicode.
That said, I think this is further showing the importance of a modularization effort. The cost for wrapping is being paid by everyone but is only needed for automatic, runtime help generation.
If these are raw binary sizes then I'm assuming they include debug information which explains why they are so large. Running
size
on the binaries will show the size of the different sections, including.text
and the various debug sections.
Nope, it's the size of the binary compiled with --release
and after stripping. Furthermore, I use a Cargo profile which I believe is optimized for letting the compiler inline as much as possible:
[profile.release]
lto = true
codegen-units = 1
This is from https://github.com/johnthagen/min-sized-rust. I did not use opt-level = "z"
since I wanted to measure the size of a typical program optimized for speed instead of size (when testing with opt-level = "z"
I actually didn't see any improvements in the delta).
That said, I think this is further showing the importance of a modularization effort. The cost for wrapping is being paid by everyone but is only needed for automatic, runtime help generation.
Yeah, though for me, having a --help
that looks good on any terminal is an important part of a program. Unless you target an embedded platform with strict size limitations, I think a well-functioning command line program should be able to handle --help
nicely.
As an example, cargo --help
looks weird on my current terminal. This is the output (slightly abbreviated):
% cargo --help
Rust's package manager
USAGE:
cargo [+toolchain] [OPTIONS] [SUBCOMMAND]
OPTIONS:
-V, --version Print version info and exit
--list List installed commands
--explain <CODE> Run `rustc --explain CODE`
-v, --verbose Use verbose output (-vv very verbose/build.rs ou
tput)
-q, --quiet Do not print cargo log messages
--config <KEY=VALUE> Override a configuration value (unstable)
-Z <FLAG> Unstable (nightly-only) flags to Cargo, see 'car
go -Z help' for
details
-h, --help Print help information
[...]
My best guess is that it's outputting a string which is hard-wrapped in the source code.
However, I might care more about well-wrapped text than most since I've been playing with textwrap for ~5 years now :smile:
btw we recently dropped the dependency on memchr
in 83f1b165ba9303237cb8ba6878e75aaf9352ce37
Rust Version
rustc 1.31.0-nightly (e7f5d4805 2018-10-18)
Affected Version of clap
2.32.0
Bug or Feature Request Summary
I compared the binary size of a "hello world" application, with and without clap, and came up with the following data:
Using a Cargo.toml:
And a main.rs (of either this, or a version which only calls
println!("Hello world")
:I ran the following:
$ cargo rustc --release -- -C link-args=-static && strip target/release/hello
The "Non-clap" version of hello world resulted in the following output from bloaty:
Where the clap version resulted in the following:
(Diff for visibility)
The amount of .text being generated here doubles the size of the binary.
Clap is really useful, and I'd love if I could use it on more constrained environments, where binary size might be expensive.
Chatting with @kbknapp , it seems that there are some changes to clap-rs underway to allow more flexible usage of the crate, potentially disabling some features at compile-time to shrink the binary, but I figured I'd file this bug to track.
Are there currently other ways of toggling clap features, to compare binary sizes for this evaluation?