emikulic / darkhttpd

When you need a web server in a hurry.
https://unix4lyfe.org/darkhttpd/
ISC License
1.05k stars 87 forks source link

Better argument parsing #40

Closed Mango0x45 closed 8 months ago

Mango0x45 commented 8 months ago

Minor issue, but at the moment if I wanted to say… specify a port, I need to do as follows:

$ darkhttpd /src --port 6969

This is pretty unnatural for 2 reasons:

  1. Options typically come before the positional arguments.
  2. Long options typically allow an ‘=’, however --port=6969 is invalid.

This is pretty easy to fix by argument parsing via getopt_long(), although that function is not defined by POSIX (yet exists on Glibc, MUSL, Darwin, OpenBSD, FreeBSD, NetBSD, and DragonFlyBSD to my knowledge).

If you’re open to it @emikulic, I wouldn’t mind attempting to implement more ‘standard’ argument parsing that follows the behavior of getopt_long() parsing (but cross-platform).

emikulic commented 8 months ago

Sorry to be a stick-in-the-mud but I'd prefer not to carry around an implementation of getopt_long.

I personally don't mind forcing the arg order to be serve /this/path --some-optional-stuff but if you have strong feelings about it, what do you think about changing the existing argument parsing code to, basically, consume --options and consider anything without a hyphen to be the wwwroot (and check that exactly one wwwroot was given)

My preferences are:

  1. Status quo.
  2. Adjust the existing code to allow the path to be anywhere. (assuming it's a small change)
  3. Switch to using getopt_long from the system's libc and hope that it's there. (probably won't make the existing code much shorter?)
  4. Import a reasonably licensed implementation of getopt_long and carry it forever.
  5. Custom reimplementation of getopt_long and having to test it.

What do you think?

Mango0x45 commented 8 months ago

I think the 4th option is best. It’s the most portable, and pretty easy to do. As one option we have optparse which is public domain, and well tested. I wouldn’t mind trying to implemented on a separate branch as a proof-of-concept.

hhartzer commented 8 months ago

I currently have no strong opinions either way, but wanted to share this confusing behavior I just ran into.

It may not be worth addressing as someone might have a path starting with --, but it is certainly atypical.

Normally this might be addressed with passing -- and then the path, like this: darkhttpd --ipv6 -- --ipv6.

I do appreciate how simple darkhttpd is, though, and it'd be nice to keep it that way however possible.

$ ./darkhttpd --ipv6
darkhttpd/1.15.from.git, copyright (c) 2003-2024 Emil Mikulic.
listening on: http://0.0.0.0:8080/
^C
$ ./darkhttpd . --ipv6
darkhttpd/1.15.from.git, copyright (c) 2003-2024 Emil Mikulic.
listening on: http://[::]:8080/
^C
emikulic commented 8 months ago

optparse is 400 LoC. darkhttpd is already 3000 LoC.

I would really prefer something smaller. You don't like option 2 above?

Mango0x45 commented 8 months ago

optparse is 400 LoC. darkhttpd is already 3000 LoC.

I would really prefer something smaller. You don't like option 2 above?

Option 2 is fine and definitely an improvement, but I don’t find it optimal. It also doesn’t solve the issue of --flag=val which is behaviour that basically everyone (myself included) comes to expect from CLI tools.

optparse is 400 LoC, but I don’t think that’s a great way to measure it for the following reasons:

  1. The library is already well tested and used in various projects; the likelihood of you needing to do any ‘maintenance’ on it is very very low.
  2. It’s incredibly easy to integrate — you just #include the header and that’s it.
  3. If you remove functionality we don’t need (we’d only need optparse_long() and remove the large doc-comments, the implementation falls from ~400 LoC to ~300.
g-rden commented 8 months ago

--flag=val can be done too with option 2.

Here is an example, which only parses the port like that, but it can be used for the other options too. That would add under 30 lines of code.
https://github.com/g-rden/darkhttpd/commit/7927a6433f03f06774d09449c09ea72fff447889

This implementation is not elegant, but the parsing is definetly doable.

hhartzer commented 8 months ago

While we are on the topic I thought I'd share a few more thoughts.

Especially if refactoring argument parsing, it may be worth adding tests for parse_commandline(), probably in the style of devel/test_password_equal.c.

The current code exits status 1 for invalid arguments. 2 is generally the convention here and might be more appropriate. I also wonder if a switch statement might be more appropiate than the if/else block.

g-rden commented 8 months ago

I think something like this works. https://github.com/g-rden/darkhttpd/commit/1378bd819c570a1952d69049dc28ff7a1bb796f8

I have not tested it a lot and this is the first idea that came to mind, so it might be bad.

edit: i found a bug...
edit edit: i was just confused. i think it works for every variation of options and wwwroot file.
repo: https://github.com/g-rden/darkhttpd/tree/parsing-ideas

emikulic commented 8 months ago

I'm not sure the extra complexity is warranted to support the equals-sign variant.

Or, on the other end of the complexity/effort scale, how about something table-driven like https://github.com/emikulic/darkstat/blob/a219027ec8ea6bb75647bef126dd44f56ac9c86e/darkstat.c#L190 ?

g-rden commented 8 months ago

No matter the impementation, it would break ./darkhttpd --some_dir. It's unlikely that anyone uses such a directory name, but it is possible. Users could still use ./darkhttpd -- --some_dir if they wanted, but it's a change users would have to figure out. Also the benefits, besides conformity, are minimal. Only one option could benefit from it. --forward-all does not need wwwroot, but the user must provide it. The user might ask why wwwroot must be set just to redirect. Unless --chroot is set wwwroot is not used for --forward-all anyway.

All in all I am fine with leaving it like it is.

Mango0x45 commented 8 months ago

I'm not sure the extra complexity is warranted to support the equals-sign variant.

Or, on the other end of the complexity/effort scale, how about something table-driven like https://github.com/emikulic/darkstat/blob/a219027ec8ea6bb75647bef126dd44f56ac9c86e/darkstat.c#L190 ?

It looks like such a table-approach doesn’t allow for chaining short options (e.g. ‘-rf’)? That would arguably be weirder to a user like me than any existing quirk with the arguments.

EDIT: Just realized that darkhttpd doesn’t even use short-arguments at all.

hhartzer commented 8 months ago

I personally feel like short arguments are overrated, especially for utilities that are not used constantly.

For a command like cp or chmod, it makes sense to have. But for a webserver, I think shorthand arguments are unnecessary. If I find myself using them a lot, I usually just make an alias.

Mango0x45 commented 8 months ago

I personally feel like short arguments are overrated, especially for utilities that are not used constantly.

What do you call ‘constantly’? I’ve been using DarkHTTPD almost every day for a while now.

hhartzer commented 8 months ago

I mean utilities that are used repeatedly for different actions. ls, cp, etc. I like having short flags for those. If you launch darkhttpd daily, then short flags may be helpful for you. I have a few aliases to launch common temporary webservers, along with a darkhttpd-pwd alias.

To be clear, I'm not against adding them to darkhttpd as long as the long options are still present. I'm not certain it's worth the complexity, unless it ends up being a trivial addition, if say getopt_long were used.

One pattern that I don't like is seeing shorthand flags being used in scripts. I think if it's written down in a script, it should be longhand so that it's more obvious what's going on. I just think that a lot of tools, of some degree of esotericism, have only short flags which makes them less intuitive when you do go to use them, and the terse -h output is not as helpful as it could be if long options were used instead.

Mango0x45 commented 8 months ago

One pattern that I don't like is seeing shorthand flags being used in scripts. I think if it's written down in a script, it should be longhand so that it's more obvious what's going on. I just think that a lot of tools, of some degree of esotericism, have only short flags which makes them less intuitive when you do go to use them, and the terse -h output is not as helpful as it could be if long options were used instead.

I won’t say too much because this is pretty off-topic for this issue, but as someone who writes hundreds of lines of shell script a week I couldn’t disagree more. When your scripts are no more than a few lines long, long-options are fine. That being said the moment you start writing hundreds of lines of shell script in a single file you start finding that long options cause a lot of visual clutter for little return. They only serve to better document the script yet you could also just read the manual page if you wanted to know what a short-option did.

emikulic commented 8 months ago

I re-read the README recently and I really like the ./darkhttpd /path --optional-stuff order.

I think the status quo is fine here. I'm sorry if people disagree.