blyxxyz / lexopt

Minimalist pedantic command line parser
MIT License
294 stars 9 forks source link

Include the binary's name when calling `from_args()` #5

Closed Minoru closed 2 years ago

Minoru commented 3 years ago

from_env() takes the executable's name from the argv, but from_args() expects an argv with the name removed. This discrepancy causes a bit of trouble:

I think it'd be simpler for everyone involved if from_args() took the same input as from_env() does, i.e. an argv with executable's name.

blyxxyz commented 3 years ago

It works this way for a few reasons:

These are not very strong reasons. You might be right.

But if we change it now, all code that uses from_args() breaks but still compiles. That's a worst case scenario. If we break backward compatibility people should at least get a very clear warning.

Looking back, I wish I hadn't included Parser::bin_name in the original release. It's hard to make good use of it, you e.g. can't give it to the caller when returning an error message, even though it's intended for showing in error messages. It needs a redesign to be useful.

I think that the best way forward is to deprecate bin_name, and in the next breaking release either remove it or overhaul it.

If it's overhauled, from_args(args) can perhaps be changed to something like from_args(bin_name: Option<impl Into<OsString>>, args). Then code that's broken would get a fair warning, and given an iterator you could call either from_args(iter.next(), iter) or from_args(None, iter), neither of which are very awkward.

Minoru commented 3 years ago

Okay, this makes sense. I'm one of those few who do pass the name in their tests, and check that it's extracted as it should :) Although it wouldn't be hard for me to extract it manually, of course.

The suggested overhaul looks like a fine solution to me too.

Thanks for the thorough explanation!

blyxxyz commented 2 years ago

I'd like to do this today for the next release:

If the name can't be found then "app" is used. If it's invalid unicode it's converted lossily.

This'll mean a little churn for newsboat/newsboat#1845 (lines 234-239 can become args.program_name = parser.bin_name().to_owned()). (cc @theIDinside)

How does it sound?

theIDinside commented 2 years ago

Sounds fine to me! Changes like that I can adjust to quickly so no problem for me at least.

Minoru commented 2 years ago

Change bin_name's return type to &str If the name can't be found then "app" is used. If it's invalid unicode it's converted lossily.

I understand that these choices are driven by bin_name() being used in error messages, but it still feels wrong. lexopt does very little for the user, so I don't see why it should do so much with the first argument. Hard-coding a default name seems wrong too; in Newsboat we use an empty string, but I could imagine people using the app's name instead.

I'd make it Option<OsString>, and let the user decide how to handle the corner cases.

blyxxyz commented 2 years ago

I do feel conflicted about it. I mainly want to keep it like this because processing an Option<&OsStr> is a pain. You end up with code like parser.bin_name().map(|s| s.to_string_lossy().into_owned()).unwrap_or_else(|| "myapp".into()), which is a lot harder to follow than typical argument processing (value.parse()?, value.into_string()?, value.into()).

I also feel that preserving argv[0]'s encoding is way more niche than doing it for other arguments. I can't immediately think of any programs (in any language) that use it in a way that's incompatible with this. (It's not even a good way to find the current executable.)

But maybe keeping it an Option would be better. I think I won't deprecate from_args() after all, so having it missing wouldn't be as weird, and parser.bin_name().unwrap_or("myapp") is manageable.

Minoru commented 2 years ago

It's not even a good way to find the current executable.

Oh! I didn't know. This would have been my only argument in favour of using OsStr/OsString rather than str/String. What's a better way?

parser.bin_name().unwrap_or("myapp") is manageable.

This looks good!

blyxxyz commented 2 years ago

What's a better way?

In Rust, std::env::current_exe. How it works depends on the platform, on Linux it inspects /proc/self/exe.

To get the path from argv[0] you may have to resolve it with $PATH or the current directory, and you have to trust the parent process: Unix lets you pass other things into exec, using the command name is basically a convention.

Minoru commented 2 years ago

Thanks! I knew that exec*() lets me "fool" the child process, but I didn't know I can get the name from /proc. May I suggest that the doc for bin_name() should point to std::env::current_exe? I might not be the only one who didn't know of its existence!

theIDinside commented 2 years ago

@blyxxyz Btw, will you notify us when 0.2.0 is live so that I can make the changes as speedily as possible? would greatly appreciate it!

blyxxyz commented 2 years ago

I've pushed the changes I want to make to master. I still need to give it another lookover, update the table in the README, check the CI, etcetera, but I hope to make the release later this evening. (I'll leave another comment when I do.)

blyxxyz commented 2 years ago

Version 0.2.0 is live.

Thank you for using the crate and discussing all these things! It's been very educational. I think the library is in a better state now.