commonmark / cmark

CommonMark parsing and rendering library and program in C
Other
1.63k stars 545 forks source link

[cmark --smart] paren-quote-markup combination #311

Open giucal opened 4 years ago

giucal commented 4 years ago

A plausible occurrence in a document is ("text"), which cmark --smart correctly turns into

(“text”)

However, if we emphasize the text,

("*text*")

we get

(”<em>text</em>”)

Note the incorrect right quote after the opening paren.

The same goes for the combination of a paren, a quote and other markup, such as strong emphasis and references:

("**text**") --> (”<strong>text</strong>”)
("[text]")   --> (”<a href=...>text</a>”)
jgm commented 4 years ago

Adding some diagnostics

 % ./build/src/cmark --smart
("*text*")
char = ", can_open = 0, can_close = 1
char = *, can_open = 1, can_close = 0
char = *, can_open = 0, can_close = 1
char = ", can_open = 0, can_close = 1
<p>(”<em>text</em>”)</p>

So the problem is that the opening " character is marked as can_close but not can_open. Further investigation reveals

char = ", left_flanking = 1, right_flanking = 1

Now let's look at the logic at src/inlines.c l. 444:

    } else if (c == '\'' || c == '"') {
      *can_open = left_flanking && !right_flanking &&
                   before_char != ']' && before_char != ')';
      *can_close = right_flanking;

So for a quote character, to be marked as can_open you have to be left flanking and not right flanking. In this case the " is both left and right flanking, so it isn't marked as can_open. It's both left and right flanking because it's between two punctuation characters.

We may need to tweak the logic here and add more test cases.

jgm commented 4 years ago

This change fixes the issue:

diff --git a/src/inlines.c b/src/inlines.c
index e6b491f..fb7d2e4 100644
--- a/src/inlines.c
+++ b/src/inlines.c
@@ -439,8 +439,9 @@ static int scan_delims(subject *subj, unsigned char c, bool *can_open,
     *can_close = right_flanking &&
                  (!left_flanking || cmark_utf8proc_is_punctuation(after_char));
   } else if (c == '\'' || c == '"') {
-    *can_open = left_flanking && !right_flanking &&
-                before_char != ']' && before_char != ')';
+    *can_open = left_flanking &&
+         (!right_flanking || before_char == '(' || before_char == '[') &&
+         before_char != ']' && before_char != ')';
     *can_close = right_flanking;
   } else {
     *can_open = left_flanking;

I'm not closing this yet, because we need to