github / cmark-gfm

GitHub's fork of cmark, a CommonMark parsing and rendering library and program in C
Other
875 stars 171 forks source link

Don't treat numbers as lists. #334

Open gravypod opened 1 year ago

gravypod commented 1 year ago

Sometimes a document will contain text that looks like this:

Software version:

20230101.

Currently cmark-gfm will output the following:

<ol start=20230101>
<li></li>
</ol>

This isn't what the author had intended. The author intended for the markdown to render as:

<p>20230101.</p>

If the blank item would be added to an existing list it is preserved. For example:

1. Hello
2.
3. World

Renders as:

<ol>
<li>Hello</li>
<li></li>
<li>World</li>
</ol>
wooorm commented 1 year ago

I don’t work for GH but my guess is that this likely won’t be accepted: a) this project isn’t maintained except for security vulnerabilities, b) this is a fork of CommonMark, which adds some extra features, but it doesn’t otherwise break with the CM spec, which your PR does.

This isn't what the author had intended. The author intended for the markdown to render as:

This is how markdown works. It’s documented in the spec. If the author doesn’t want this, use an escape:

a
123\. b

a 123. b

Alternatively, “20230101” looks a lot like a date, perhaps you can use 2023-01-01 instead.

gravypod commented 1 year ago

I don’t work for GH but my guess is that this likely won’t be accepted: a) this project isn’t maintained except for security vulnerabilities,

That's fine, we have our own internal fork and we just like to upstream things which are of general interest to the community.

b) this is a fork of CommonMark, which adds some extra features, but it doesn’t otherwise break with the CM spec, which your PR does.

Does this break the cmark spec? My reading of https://spec.commonmark.org/0.30/#ordered-list doesn't lead me to believe there is anything in the spec which discusses bare numbers being treated as a list.

This repo has a set of tests which validate all of the examples in the spec render as expected and those tests are passing at this change.

If there is some part of the spec which covers these cases and says they should be lists please let me know:

*
-
1.
1)

Alternatively, “20230101” looks a lot like a date, perhaps you can use 2023-01-01 instead.

My kingdom for a unified release versioning scheme.

wooorm commented 1 year ago

Does this break the cmark spec?

Somewhat pedantically, every change would break CommonMark: the grammar of markdown is, just like HTML, expressed as .* in BNF: every character is valid and produces something. Changing the meaning of things, is SemVer breaking.

Whether it’s practically breaking? Yes: as someone who makes popular markdown parsers, I have seen these cases expressed in documents by users, and I have written code to handle this and tests to confirm it.

If there is some part of the spec which covers these cases and says they should be lists please let me know:

*
-
1.
1)

Yes, all those cases are lists. See earlier, 5.2 List items for more info. Particularly see point 3 “Item starting with a blank line” of lists (after https://spec.commonmark.org/0.30/#example-277), especially example 284 https://spec.commonmark.org/0.30/#example-284.

gravypod commented 1 year ago

@wooorm this code supports 277 + the requirement that you can start lists with a blank line. See the code adding support for this.

I can see https://spec.commonmark.org/0.30/#example-284 applying more here but this code only applies to numbers. I see this section talks both about "lists" preferring to unordered and ordered list.

I can definitely see how we could be running afoul of that here.

I'm going to leave the PR open just in case others are interested in this modification. Users seem to like this on our internal corpus.

kevinbackhouse commented 1 year ago

@gravypod: Have you tried submitting this change to cmark? It's in a part of the code that's mostly the same in both codebases, so I think it would probably work. cmark-gfm is a fork of cmark that adds the GFM extensions so if they accept the change then I think it would make sense for us to accept it too.