commonmark / cmark

CommonMark parsing and rendering library and program in C
Other
1.62k stars 539 forks source link

Support nested lists in groff man #386

Closed nicqrocks closed 3 years ago

nicqrocks commented 3 years ago

Lists are now nested properly when converting to groff man format.

Previously, nothing happened during conversion when a new list begun inside another, leading to all lists being flattened. This commit now has block quotes and lists handled the same: indent when a new list is seen. From what I can tell, this is how nested lists are supposed to be handled in the groff man format.

The tests were updated, and pass.

jgm commented 3 years ago

If I'm not mistaken, this indents even the first level of a nested list. It seems better no leave the top level unindented and only indent subsequent levels. (At least, that's how pandoc does it.)

nicqrocks commented 3 years ago

this indents even the first level of a nested list.

Yep. This commit will indent each list, including the first level. The reason for doing that is because it makes the logic simpler to think about (every new list is indented), and allows for a clear difference between the list and the surrounding text. This would also solve the problem presented in issue #258, which talks about indenting paragraphs that are part of a list item.

% cat t.md
Foo bar

1. item 1

2. item 2

   item 2 paragraph 2.

   item 2 paragraph 3.

Regular text after list.

% echo '.TH Example 1' > t.man
% cmark -t man < t.md >> t.man
% man ./t.man | cat
Example(1)                  General Commands Manual                 Example(1)

       Foo bar

              1.  item 1

              2.  item 2

              item 2 paragraph 2.

              item 2 paragraph 3.

       Regular text after list.

                                                                    Example(1)

seems better no leave the top level unindented and only indent subsequent levels. (At least, that's how pandoc does it.)

I'm not familiar with pandoc, so I changed my above example and rendered the document with this modified version of cmark and with pandoc respectively.

Example(1)                  General Commands Manual                 Example(1)

       Et blanditiis hic fuga. Sint fuga consequatur quidem omnis ut nihil ea.
       Dicta nostrum  nisi  veritatis  quas  veritatis  earum  ad  qui.  Omnis
       quisquam  omnis doloremque quia nihil optio veniam explicabo. Sunt quia
       modi voluptatum.

              1.  item 1

              2.  item 2

                  · nested item 1

                  · nested item 2

                    · double nested item 1

                      · yet another nested item

              3.  item 3

              item 3 paragraph 2.

              item 3 paragraph 3.

       Regular text after list.

                                                                    Example(1)
Example(1)                  General Commands Manual                 Example(1)

       Et  blanditiis  hic  fuga.  Sint fuga consequatur quidem omnis ut nihil
       ea.  Dicta nostrum nisi veritatis quas veritatis earum ad  qui.   Omnis
       quisquam omnis doloremque quia nihil optio veniam explicabo.  Sunt quia
       modi voluptatum.

       1. item 1

       2. item 2

           · nested item 1

           · nested item 2

             · double nested item 1

               · yet another nested item

       3. item 3

           item 3 paragraph 2.

           item 3 paragraph 3.

       Regular text after list.

                                                                    Example(1)

I like how pandoc handles the paragraphs of item 3, and may add that if I have some spare time, but I don't really know if I agree with not indenting the first level list. I suppose whether or not the list is indented is more a matter of opinion, but I do believe the general style is to indent a list from the rest of the text. I am not a grammar teacher though, and most of my opinions come from reading whitepapers.

If more people are adamant about not indenting the first level list, then it could probably be changed moderately easily.

jgm commented 3 years ago

I hadn't recalled that there was still a problem with indented paragraphs under list items. That makes me think a more general solution would be better: what we want to ensure is that all block-level content in a list item is indented. This would include paragraphs, but it would automatically handle nested lists as well.

Here's an example of how pandoc handles this:

% pandoc -t man
1.  one

2.  two

    subparagraph

3.  nested list

    - one
    - two
^D
.IP "1." 3
one
.IP "2." 3
two
.RS 4
.PP
subparagraph
.RE
.IP "3." 3
nested list
.RS 4
.IP \[bu] 2
one
.IP \[bu] 2
two
.RE

This isn't quite ideal, actually; we should try to have the indented content line up with the content after the bullet or list number:

.IP "1." 3
one
.IP "2." 3
two
.RS 3
.PP
subparagraph
.RE
.IP "3." 3
nested list
.RS 3
.IP \[bu] 2
one
.IP \[bu] 2
two
.RE
nicqrocks commented 3 years ago

I made a couple more changes, which seem to work out fairly nice. Subparagraphs are now properly indented, and unlike pandoc, I decided to not specify indentation amount and instead rely on the defaults (which seems to line up nicely on my machine).

Unfortunately, I could not find a nice way to specify indentation across multiple subparagraphs, so I opted to instead indent each of the paragraphs individually by wrapping them in .RS and .RE. For example:

% cat > t.md
foo bar baz.

- list item

    subparagraph

    another subparagraph

    yet another subparagraph

- another list item

regular text again.

% cmark -t man < t.md
.PP
foo bar baz.
.RS
.IP \[bu] 2
list item
.RS
.PP
subparagraph
.RE
.RS
.PP
another subparagraph
.RE
.RS
.PP
yet another subparagraph
.RE
.IP \[bu] 2
another list item
.RE
.PP
regular text again.

Personally, I think that could be much better, but I have not quite found a nice way of doing that yet. If you have some ideas, I'm all for it.

Until then, I think this should work out well. man seems to be able to handle it without trouble, even when converting to postscript.

jgm commented 3 years ago

I could not find a nice way to specify indentation across multiple subparagraphs, so I opted to instead indent each of the paragraphs individually by wrapping them in .RS and .RE

I may be misunderstanding what you mean, but this works fine:

.RS
.PP
subparagraph
.PP
another subparagraph
.PP
yet another subparagraph
.RE
nicqrocks commented 3 years ago

Ah my mistake, I should have been more clear.

You are correct in that wrapping all of the subparagraphs between .RS and .RE is the best way of accomplishing indentation. I was having trouble devising a nice/clean way of having the program do that. The change I implimented works, but does not convert it very nicely: it ends up with each subparagraph wrapped in a set of .RS and .RE, instead of all of them in one. Perhaps having the program check if a paragraph's previous node was a list would make a good point to place a .RS, but I'm having trouble figuring out how to specify that.

I'll look into it more and try to get used to how this parser works. If you have have any ideas, I'm open to suggestions.

jgm commented 3 years ago

One possibility would be this. When you enter a list ITEM, set a counter to keep track of the number of blocks emitted in that item. (This will have to be an array indexed to the list level.) Then, for each block you enter, if list level > 0, increment that counter. When the counter gets to 2, emit the indentation code. When exiting a list item, if the counter is >= 2, emit the de-indentation code. Something like that!

jgm commented 3 years ago

I think that to do this properly we'd need to add a field in_item to cmark_renderer in render.h. A small API change. (Not an API change because this isn't part of the public API.)

jgm commented 3 years ago

I've got a working prototype now.

jgm commented 3 years ago

I've fixed this issue in a more general way with e1fd211a990059a119f8de83a583d88f91441aad so I'm closing this PR. Thanks for the initial impetus and discussion! It was a bit trickier to implement than I'd anticipated.