support markdown in help strings

stevengj commented 3 months ago

Draft PR to support Markdown text in help strings.

carlobaldassi commented 2 months ago

Thanks, looks good. I think it would make sense to also allow markdown in the description and the epilog. Also needs tests+docs.

stevengj commented 2 months ago

Yes, I agree this is not ready to merge — I just whipped it together in response to a question on discourse. I'm not sure I have the time to polish it myself, but I wanted to post it as a starting point for anyone who wants to work on this.

maxkapur commented 2 months ago

Discourse questioner here! I have checked out stevengj's branch and am working on adding support for markdown description/epilog (which is where I would probably use it most IRL): https://github.com/maxkapur/ArgParse.jl/tree/sgj/markdown

But it's not ready for review yet. Actually, the show_message() method from parsing.jl relies on split(::AbstractString) which doesn't accept Markdown input, so this may prove more difficult than it seems:

Stack trace

``` > julia --project examples/argparse_example9.jl --help usage: argparse_example9.jl [--opt1 OPT1] [-o OPT2] [-h] [src] [dest] ERROR: LoadError: MethodError: no method matching split(::Markdown.MD, ::String) Closest candidates are: split(::T, ::Any; limit, keepempty) where T<:AbstractString @ Base strings/util.jl:626 Stacktrace: [1] show_message(io::IO, message::Union{AbstractString, Markdown.MD}, preformatted::Bool, width::Int64) @ ArgParse /var/home/max/gits/stevengj/ArgParse.jl/src/parsing.jl:388 [2] show_help(io::IO, settings::ArgParseSettings; exit_when_done::Any) @ ArgParse /var/home/max/gits/stevengj/ArgParse.jl/src/parsing.jl:371 [3] show_help(settings::ArgParseSettings; kw...) @ ArgParse /var/home/max/gits/stevengj/ArgParse.jl/src/parsing.jl:308 [4] parse1_flag!(state::ArgParse.ParserState, settings::ArgParseSettings, f::ArgParse.ArgParseField, has_arg::Bool, opt_name::AbstractString) @ ArgParse /var/home/max/gits/stevengj/ArgParse.jl/src/parsing.jl:684 [5] parse_long_opt!(state::ArgParse.ParserState, settings::ArgParseSettings) @ ArgParse /var/home/max/gits/stevengj/ArgParse.jl/src/parsing.jl:848 [6] parse_args_unhandled(args_list::Vector, settings::ArgParseSettings, truncated_shopts::Bool) @ ArgParse /var/home/max/gits/stevengj/ArgParse.jl/src/parsing.jl:629 [7] parse_args_unhandled(args_list::Vector, settings::ArgParseSettings) @ ArgParse /var/home/max/gits/stevengj/ArgParse.jl/src/parsing.jl:596 [8] parse_args(args_list::Vector, settings::ArgParseSettings; as_symbols::Bool) @ ArgParse /var/home/max/gits/stevengj/ArgParse.jl/src/parsing.jl:470 [9] parse_args(args_list::Vector, settings::ArgParseSettings) @ ArgParse /var/home/max/gits/stevengj/ArgParse.jl/src/parsing.jl:466 [10] parse_args(settings::ArgParseSettings; kw...) @ ArgParse /var/home/max/gits/stevengj/ArgParse.jl/src/parsing.jl:422 [11] main(args::Vector{String}) @ Main /var/home/max/gits/stevengj/ArgParse.jl/examples/argparse_example9.jl:20 [12] top-level scope @ /var/home/max/gits/stevengj/ArgParse.jl/examples/argparse_example9.jl:27 in expression starting at /var/home/max/gits/stevengj/ArgParse.jl/examples/argparse_example9.jl:27 ```

stevengj commented 2 months ago

Actually, the show_message() method from parsing.jl relies on split(::AbstractString) which doesn't accept Markdown input

I think the simplest approach is just to convert Markdown into an ordinary string (with ANSI formatting escape codes) before subsequent processing, as in my example PR above. Maybe make it simpler with:

stringformat(s::Markdown.MD) = lstrip(repr("text/plain", s, context=:color=>true))
stringformat(s::AbstractString) = s

and just call this on any string-like argument that might be markdown. You can do something similar for the new StyledStrings stdlib.

maxkapur commented 2 months ago

Thanks for the tip! I took the approach you suggested.

I had make an alternative implementation for the following function, however, which is used to print the description and epilog blocks. It does some manual splitting and formatting that plays very poorly with Markdown. For example, if you just remove the type restriction on message and call string_format(message) at the top, it will turn md"## Subheading" into "Subheading ==========":

function show_message(io::IO, message::AbstractString, preformatted::Bool, width::Int)
    if !isempty(message)
        if preformatted
            print(io, message)
        else
            for l in split(message, "\n\n")
                message_wrapped = TextWrap.wrap(l, break_long_words = false, break_on_hyphens = false, width = width)
                println_unnbsp(io, message_wrapped)
            end
        end
        println(io)
    end
end

My alternative is simply:

function show_message(io::IO, message::Markdown.MD, preformatted::Bool, width::Int)
    if !isempty(message)
        println(io, string_format(message))
        println(io)
    end
end

This looks right for typical inputs (see my additions examples/argparse_example9.jl and test/argparse_test15.jl), but it completely ignores the preformatted and width parameters. In commit #03d8b171fb2, I added some warnings if you pass nondefault values to these parameters.

I added a (somewhat unsightly) test:

@test stringhelp(s) == "usage: argparse_test15.jl [--opt1 OPT1] [--opt2 OPT2]\n\nTest 15 for \e[36margparse.jl\e[39m: Markdown strings. This description text uses\n  various Markdown features such as \e[1mstrong text\e[22m, text with \e[4memphasis\e[24m, \e[36mcode\n  snippets\e[39m, and math: \e[35my = mx + b\e[39m.\n\n\e[1m  Subheading\e[22m\n\e[1m  ==========\e[22m\n\n  The following list should be numbered automatically:\n\n    1. Here\n\n    2. Is\n\n    3. A\n\n    4. List\n\noptional arguments:\n  --opt1 OPT1  a flag with String helptext (type: Int64)\n  --opt2 OPT2  a flag with ~~String~~ \e[1mmarkdown\e[22m helptext\n\nSee our website (https://example.com) for more information.\n\n"

And updated the docs and docstrings where relevant.

I held off on StyledStrings for now but I may be able to assist with a future PR.

Let me know if this works and if I can submit a PR from my branch: https://github.com/maxkapur/ArgParse.jl/tree/sgj/markdown

stevengj commented 2 months ago

You should add an argument to string_format to indicate the line-wrapping length. With an ordinary String it can call wrap, whereas with Markdown.MD it can use the built-in line-wrapping by setting the :displaysize parameter of the IOContext.

maxkapur commented 2 months ago

OK, here's what that looks like (pushed to my branch):

# src/common.jl
# ...

# Wrapper function to lower markdown strings to REPL representation
function string_format(s::Markdown.MD; width::Int)
    context = IOContext(stdout, :color => true, :displaysize => (displaysize()[1], width))
    lstrip(repr("text/plain", s; context=context))
end
string_format(s::AbstractString; width::Int) = TextWrap.wrap(s; width=width)

Unfortunately, we have to construct our own IOContext instance here, because

repr("text/plain", s; context=(:color => true, :displaysize => (displaysize()[1], width)))

is a MethodError. You can call repr(x; context) with a tuple of pairs for context (since Julia v1.7), but the repr(mime, x; context) version accepts only a single pair.

I assume that the io argument can always be stdout (this being help text), but can you think of any exception?

I'm also not in love with how everything after the first line of the description is getting lpadded by 2:

This doesn't happen for plain string input. I'm still working out why this happens; I can't find any instances of lpad or " anywhere relevant.

Edit: Figured out the lpad thing, will return with update.

stevengj commented 2 months ago

I assume that the io argument can always be stdout (this being help text), but can you think of any exception?

I think it will ignore the io stream anyway in repr — it just copies the key/value pairs from context to an internally created IOContext object wrapping an IOBuffer.

Equivalently, you can do:

buf = IOBuffer()
show(IOContext(buf, :color => true, :displaysize => (100, width)), "text/plain", s)
s = lstrip(String(take!(buf)))

(I wouldn't use displaysize()[1] — it doesn't seem like the help output should depend on this.)

maxkapur commented 2 months ago

You would not believe how tricky the lpad issue is! Basically, repr("text/plain", markdown_string) automatically indents the whole thing by two spaces. In your example, you removed these with lstrip(), but that only works for the first line:

julia> s = md"Very long markdown string that **must** have a line break"
       context = IOContext(stdout, :displaysize => (100, 20), :color => true)
       lstrip(repr("text/plain", s; context=context)) |> println
Very long
  markdown string
  that must have a
  line break

So, maybe what we really need is Base.unindent(), but this fails on subheadings:

julia> s = md"""
       Very long markdown string

       # Subheading

       More content
       """
       context = IOContext(stdout, :displaysize => (100, 20), :color => true)
       Base.unindent(repr("text/plain", s; context=context), 2) |> println
Very long
markdown string

  Subheading
  ≡≡≡≡≡≡≡≡≡≡

More content

Replacing println with show reveals what's going on: The ANSI color code before the subheading is inserted before the leading two spaces, so Base.unindent() thinks the line is already unindented:

julia> s = md"""
       Very long markdown string

       # Subheading

       More content
       """
       context = IOContext(stdout, :displaysize => (100, 20), :color => true)
       Base.unindent(repr("text/plain", s; context=context), 2) |> show
"Very long\nmarkdown string\n\n\e[1m  Subheading\e[22m\n\e[1m  ≡≡≡≡≡≡≡≡≡≡\e[22m\n\nMore content"

So, the problem is particular to having :color => true (which is the whole point).

Maybe one could fix this with a suitably tortured regex, but I don't really want to advocate for that solution. In my branch, I've decided to just let the indent stay put for the description and epilog, which greatly simplifies string_format() and removes the need to keep track of width vs. width + 2:

function string_format(s::Markdown.MD; width::Int)
    buf = IOBuffer()
    context = IOContext(buf, :color => true, :displaysize => (100, width))
    repr("text/plain", s; context=context)

    # NOTE: repr() above automatically indents the string by 2 spaces. We can
    # remove the indentation using Base.unindent() as follows, but the results
    # are inconsistent with :color => true enabled when s contains subheadings,
    # because repr() inserts the ANSI escape code enabling the bold font
    # *before* the leading two spaces.

    # ansi_escaped = repr("text/plain", s; context=context)
    # unindented = Base.unindent(ansi_escaped, 2)
end

For the help text (short descriptions of each option), I've used the unindent() solution so that at least the columns remained aligned; presumably, users are less likely to use subheadings in help texts than in the description and epilog.

What do you think about this solution?

carlobaldassi / ArgParse.jl

support markdown in help strings #134