danbooru / dtext_rb

compiled dtext parser extension
1 stars 6 forks source link

[Bug] Markdown Style Links Use Reversed Syntax #5

Closed Undisguised-Art closed 8 months ago

Undisguised-Art commented 8 months ago
Anyone who uses Markdown regularly will notice that Danbooru parses Markdown style links in reverse order of the expected syntax. Syntax Example - Raw Example - As formatted by GitHub's parser
Current:
[<link>](<custom‑text>)
[https://danbooru.donmai.us](Danbooru) https://danbooru.donmai.us
Correct:
[<custom‑text>](<link>)
[Danbooru](https://danbooru.donmai.us) Danbooru

Relevant lines of code: [dtext.cpp.rl L181]:

markdown_link = '[' delimited_absolute_url >mark_a1 %mark_a2 :>> '](' nonnewline+ >mark_b1 %mark_b2 :>> ')';

[dtext.cpp.rl L349-L355]:

  bracketed_textile_link | named_bbcode_link => {
    append_named_url(sm, { sm->b1, sm->b2 }, { sm->a1, sm->a2 });
  };

  markdown_link | html_link => {
    append_named_url(sm, { sm->a1, sm->a2 }, { sm->b1, sm->b2 });
  };

Additional References: Markdown Guide | basic-syntax/#links Wikipedia

notWolfxd commented 8 months ago

Pretty straightforward change. Just need to update

- markdown_link = '[' delimited_absolute_url >mark_a1 %mark_a2 :>> '](' nonnewline+ >mark_b1 %mark_b2 :>> ')';
+ markdown_link = '[' nonnewline+ >mark_b1 %mark_b2 :>> '](' delimited_absolute_url >mark_a1 %mark_a2 :>> ')';
Undisguised-Art commented 8 months ago

By fixing this issue, I think it will also resolve the concern mentioned on line 180:

# XXX: internal markdown links aren't allowed to avoid parsing closing tags as links: `[b]foo[/b](bar)`.

Therefore, relative links could potentially be permitted for the markdown style.

Undisguised-Art commented 8 months ago

It's not so simple to fix. Allowing any text to be specified in the square brackets [] results in a match starting that can overwrite the position markers set by other patterns. (At least that is what I assume is happening in my tests.)

Ragel is new to me so this may have issues that I cannot foresee, but here is one way that might work.

Disallow square brackets for markdown_link titles. Disallow parentheses for markdown_link URLs.

- markdown_link = '[' delimited_absolute_url >mark_a1 %mark_a2 :>> '](' nonnewline+ >mark_b1 %mark_b2 :>> ')';
+ markdown_link = '[' ws* (nonnewline - '[' - ']')+ >mark_a1 %mark_a2 :>> ws* '](' ((delimited_absolute_url | delimited_relative_url)  - '(' - ')') >mark_b1 %mark_b2 :>> ')';
-  bracketed_textile_link | named_bbcode_link => {
+  bracketed_textile_link | named_bbcode_link | markdown_link => {
     append_named_url(sm, { sm->b1, sm->b2 }, { sm->a1, sm->a2 });
   };

-  markdown_link | html_link => {
+  html_link => {
     append_named_url(sm, { sm->a1, sm->a2 }, { sm->b1, sm->b2 });
   };
evazion commented 8 months ago

We can't get rid of the old backwards syntax because there are existing comments that depend on it. Both versions have to be supported.

It's not so simple to fix. Allowing any text to be specified in the square brackets [] results in a match starting that can overwrite the position markers set by other patterns. (At least that is what I assume is happening in my tests.)

The easiest solution is just to define new position markers. This happens because when multiple rules match the same text, each rule is pursued in parallel and they end up overwriting each others' position markers if they use the same markers.

Undisguised-Art commented 8 months ago

@evazion Glad to see the update, and thanks for confirming my suspicion about multiple rules leading to overwriting.