Embedded command documentation needs to be moved outside the C source files

krader1961 commented 6 years ago

See issue #503 for an example where the information in man ksh differs from some_command --help. In that issue it is with respect to the hist command but there are plenty of others. It is bad enough that having two sources for command documentation inevitably leads to such discrepancies. Embedding the command help text in the C source also makes it hard for a human to read and edit.

Contrast this state of affairs with how the fish shell does it. It supports both builtin_command --help and man builtin_command. And due to how the help text is managed it is guaranteed they will always be in agreement. This project should be able to do the same thing.

P.S., If and when this is resolved it would be a good idea to switch the markup language from the ancient troff style to Markdown.

krader1961 commented 6 years ago

For what it's worth the fish shell has decided to switch from Doxygen to Sphinx. Something I strongly recommend for this project. The ancient Troff markup language is all but unknown to anyone younger than myself (50+ years old). Not to mention that Troff markup is awful by modern sensibilities and there are few tools to translate it to other formats.

krader1961 commented 6 years ago

Another example is the whence builtin. Typing whence --help says it supports a -q flag which works as expected. But the man page makes no mention of that flag.

I'm tempted to mark this as a blocking bug for the next ksh release. It should be impossible for the man page and "internal" documentation (e.g., "a_builtin --help") to disagree. People modifying the code shouldn't have to keep two distinct sources of truth in sync. There should be only one source of truth regarding the documented flags and behavior of builtin commands.

dannyweldon commented 6 years ago

The fact that the ast getopts builtin embeds documentation means that the getopts version should be the most correct version, as it has to be consistent with the options themselves.

The getopts builtin has an option to dump the raw nroff code:

thecmd --nroff

which means one possibility is to generate sections of the ksh man page from the builtins themselves using the above, which is currently dumping a full man page, but it could be modified to dump just the sections needed similarly to:

thecmd --help

Or the output of thecmd --nroff could be parsed to extract just the relevant information. This would enforce the DRY principle, as opposed to currently, where there are two sources of truth.

The getopts builtin also has an api version specification which could be increased to allow a newer format that allows the use of embedded markdown syntax.

krader1961 commented 6 years ago

@siteshwar and I discussed this issue in the ksh93/users Gitter room today. What follows are the comments I made.

Doxygen is a really awful tool for writing user documentation. The fish shell project has had multiple issues related to writing documentation in Doxygen. And among other problems is the lexicon_filter.in script which no one understands.

Using Doxygen makes things hard that should be trivial and it is really hard to know if you've got the markup right. Too, they've had problems with the HTML and man pages looking different.

Moving the command documentation out of the source code has two goals: 1) a single source of truth, and b) switching from the ancient and frankly awful nroff markup to something more sensible that is easy to turn into man pages, HTML, and whatever else makes sense.

And a lesser goal is making it easier to edit the command docs by removing all the C syntax surrounding the text.

Embedding the docs so cmd --help works is a great idea. It just shouldn't have been done by embedding it in the source code. It should be embedded at build time by merging plain text files that contain the documentation.

DavidMorano commented 6 years ago

@krader1961

Embedding the docs so cmd --help works is a great idea. It just shouldn't have been done by embedding it at source edit time. It should be embedded at build time by merging plain text files that contain the documentation.

I do not know if this has been suggested yet or not, but can we get the on-line documentation out of the regular default run-time code altogether? That is: not included at source-edit time and NOT included either at build time. But rather, stored in the distribution directory tree some place (like for example under /usr/man/ or /usr/share/man or /usr/share/ksh/man/ or /usr/lib/ksh/help/, et cetera). Getting as small a run-time memory footprint for KSH was important years ago for very tight embedded memory budgets. I think this is still a good goal even now. For myself, I feel that the source documentation should never have been included into the regular run-time object from day one, but somehow it got in there in its current edit-time format (hard to read and hard to maintain). Storing the help text separately can also be thought of as using a plugin or a sort. That is, it is loaded at run-time, but only when called for. I tend to have extremely extensive use of run-time plugins (object files) now-a-days, as many other projects in the world also do now-a-days. Storing the help-man text separately can be thought of as an extension of this plugin idea.

I and a gazillion others have included a feature such as $ cmd --help for tons of years (decades) now, and we always stored the help-man text in separate files, stored separately within the run-time distribution tree. I have even done this for my own KSH built-ins. In fact, where I have done this feature at all for programs, I have done it storing the help documentation separately for all of the 40 years I have been programming this particular feature!

Is it possible that we can do the same for KSH going forward (that is: store the help-man text in a separate file, accessed at run-time when called for)?

Thanks for any consideration.

krader1961 commented 6 years ago

It is possible to store the documentation external to the binary. The fish shell shows how to do it. It supports help a_cmd to open the local HTML version of doc in your browser. It also supports a_cmd --help to display the man page without piping through the pager. And, lastly, it supports man a_cmd via a man function that manipulates MANPATH so the fish man pages have higher precedence. If the install prefix is /usr/local the man pages end up in /usr/local/share/fish/man/man1/ and the HTML pages in /usr/local/share/doc/fish/. Because these are relative to the install root for the fish binary it also works if you install it under your home directory or some other atypical location.

This should be done in steps:

Separate the embedded command documentation into independent files that get merged when compiling the source.
Switch from Troff to a more modern markup language like reStructuredText as used by Sphinx or Markdown.
Stop embedding the command documentation in the ksh binary. Note that this can be done without losing functionality like a_cmd --help and makes it easier to implement other functionality like the ability to display the HTML version in a web browser.

dannyweldon commented 6 years ago

Okay, I think it would be good to provide this ability to have the documentation external to the binary, as long as the existing getopts method still works because it is useful to be able to embed docs in shell scripts, which I believe originated with perl, but I agree that the current syntax is difficult, so a new simpler syntax would be nice, hopefully with an api version increase. :)

My main concern though, is will it still be able to work with relative paths because ksh has a feature to allow for easy creation of packages with their own bin, lib, and fun directories using a .paths file in the bin directory, which you just add to $PATH?

DavidMorano commented 6 years ago

@krader1961

My main concern though, is will it still be able to work with relative paths because ksh has a feature to allow for easy creation of packages with their own bin, lib, and fun directories using a .paths file in the bin directory, which you just add to $PATH?

I am sure that something can be worked out. KSH already does some "magic path" searching for other things. For myself, as I have implemented this sort of thing myself in the past, I search both a set of possible (somewhat standard) set of relative paths, and also a set of relatively standard system absolute paths for a program-specific storage directory in question (in this case one that would contain the doc files). For example, the first doc files found under ${PATH}/../doc/ksh/ would do. But a whole set of both relative and absolute paths could also be searched. For those still without enough imagination, reference both my and Kurtis Rader's previous posts on this topic for search-path suggestions. Not to intentionally complicate things too much, but I actually do an entire indirection on this whole process also. I first search a set of relative and absolute paths for a possible configuration file that itself contains a list of possible relative and absolute search paths, and failing that I search a default list of relative and absolute search paths. Also, not to explain here, but I also allow for arbitrarily different search paths for each individual program or KSH builtin, but that is likely not needed for most purposes. For those who may say, "but this is a lot of code?" the answer is, it only has to be written once (pretty much for all time). For a reasonably good and experienced programmer (with maybe a little OOD and knowledge of containers and the such), this is not a big problem (rather trivial in the scheme of things). But only a minimal solution is needed for starters.

If someone is thinking that searching for doc or help files might be a time consuming and long process, the correct answer is "who the hell cares." It is not something that matters from a performance perspective. Sorry if I was a bit too abrupt just now. :-)

siteshwar commented 6 years ago

I hit this issue again in #945 and #948 and I think it's the right time to start taking steps to switch to a newer documentation system. I have discussed this issue before with @krader1961 and we agreed on switching to Sphinx. If anyone has objection to using it, please state why and what other tools we should consider.

krader1961 commented 6 years ago

@dannyweldon mentioned the ability to embed documentation inside scripts/functions that can be used by builtin getopts. This is similar to DocOpt but far uglier due to the support for Troff markup. I loath DocOpt type schemes. The AST implementation is particularly bad. It makes it harder to write clear documentation and harder to understand which options are valid. Trivial typos that are easy for a human to overlook can change the results. It does have the advantage of supporting more than plain text output compared to DocOpt.

AFAICT the ability to use DocOpt like strings with the builtin getopts command is undocumented. There are only two unit tests that verify it works. I am strongly opposed to documenting this feature. It makes far more sense to ensure the ksh implementation is compatible with bash and zsh. If we're going to embark on a radical change to how commands are documented we shouldn't be bound to supporting an undocumented feature of the getopts builtin.

Note that getopts --help outputs:

Usage: getopts [ options ] opstring name [args...]
OPTIONS
  -a name         Use name instead of the command name in usage messages.

Heck if I can tell how that results from the text in var sh_optgetopts. And doing getopts --man dumps a wall of text to my terminal with no styling (e.g., bold, italics, etc.) of the text and weird artifacts like --??man (which seeming should just be --man). It doesn't even have the courtesy to pipe the output through my pager. I have to run getopts --man 2>&1 | less (note the stderr redirection).

For comparison the fish shell has an open issue to implement a DocOpt like capability which is more than six years old. The lead on that project promised, 2 years ago, to implement it within six months. That never happened and didn't appear like it would ever happen. So I implemented argparse in just a couple of weeks. Now every fish function/script can implement option parsing in a manner 100% compatible with the bog standard getopt_long() function that 99% of all UNIX programs written in C/C++ uses.

krader1961 commented 6 years ago

When I say "builtin getopts command is undocumented" I mean in the output of "man ksh" which is all that most people will read. Few people are going to think to run getopts --man to learn what it really supports.