executablebooks / MyST-Parser

An extended commonmark compliant parser, with bridges to docutils/sphinx
https://myst-parser.readthedocs.io
MIT License
738 stars 196 forks source link

Unable to link with multiple query params #760

Open JonZeolla opened 1 year ago

JonZeolla commented 1 year ago

Describe the bug

context I am trying to make a link containing query params such as https://example.com?first=a&second=b

expectation I expected that the rendered link would contain all of the query params.

bug But instead the & becomes encoded as & and breaks the link. I thought I'd be able to escape the & or that it would work natively, but neither seem to be true.

Reproduce the bug

[example][EXAMPLE]

[EXAMPLE]: https://example.com?first=a&second=b "Example"

and

[example][EXAMPLE]

[EXAMPLE]: https://example.com?first=a\&second=b "Example"

Renders:

<a class="reference external" href="https://example.com?first=a&amp;second=b" title="Example">example</a>

List your environment

Using MyST-Parser: 1.0.0

welcome[bot] commented 1 year ago

Thanks for opening your first issue here! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out EBP's Code of Conduct. Also, please try to follow the issue template as it helps other community members to contribute more effectively.
If your issue is a feature request, others may react to it, to raise its prominence (see Feature Voting).
Welcome to the EBP community! :tada:

sylvestre commented 1 year ago

Reported also here: https://bugzilla.mozilla.org/show_bug.cgi?id=1828390

chrisjsewell commented 1 year ago

Heya, yep this is consistent with the CommonMark spec:

and also I believe with the HTML spec: https://stackoverflow.com/a/3705601/5033292

are you sure it is actually breaking the link, it doesn't appear to break for me in Firefox?

JonZeolla commented 1 year ago

It will load the page but the query param isn't properly sent to the server, so links like this turn into this and don't process the start time.

chrisjsewell commented 1 year ago

What you said isn't quite correct; it turns it into <a href="https://www.youtube.com/watch?v=dQw4w9WgXcQ&amp;t=2m12s">this</a> (not with the backslash escapes you put in your 2nd link above), which renders as this and any browser I try this link on correctly processes the start time (and gives me a musical surprise lol)

JonZeolla commented 1 year ago

@chrisjsewell yeah, the backslash escapes were for GitHub rendering to make a link that matches when you click it (if you look at the resulting href on GitHub it is accurate)

Although testing on my phone just now it works as I would want it to, but that’s with the YouTube app handling it, not a browser

kan-fu commented 9 months ago

I am having the exact same issue. For https://example.com?first=a&amp;second=b, the api service I am using complains that the second parameter is amp;second, which it expects to be second. Is there any way to let myst not encode the url?

Update: I ended up manually using sed to find and remove the amp;.

upekkha commented 7 months ago

Same here. Markdown-it-py 3.0.0 generates the correct url with & encoded as &amp;

$ markdown-it <(echo "[example](https://example.com?first=a&second=b)")
<p><a href="https://example.com?first=a&amp;second=b">example</a></p>

whereas myst-parser 2.0.0 yields a broken &amp;amp;

$ myst-docutils-demo <(echo "[example](https://example.com?first=a&second=b)")
<p><a class="reference external" href="https://example.com?first=a&amp;amp;second=b">example</a></p>
shimizukawa commented 7 months ago

A potential solution would be to adjust the MyST-Parser code to prevent the initial escaping of the URL, allowing docutils or the final HTML writer to handle the necessary escaping. This could involve simply removing or modifying the call to escapeHtml in the MyST-Parser's conversion process.

diff --git a/myst_parser/mdit_to_docutils/base.py b/myst_parser/mdit_to_docutils/base.py
index 4a02459..cee67c9 100644
--- a/myst_parser/mdit_to_docutils/base.py
+++ b/myst_parser/mdit_to_docutils/base.py
@@ -996,7 +996,7 @@ class DocutilsRenderer(RendererProtocol):
             if "classes" in conversion:
                 ref_node["classes"].extend(conversion["classes"])

-        ref_node["refuri"] = escapeHtml(uri)
+        ref_node["refuri"] = uri
         if implicit_text is not None:
             with self.current_node_context(ref_node, append=True):
                 self.current_node.append(nodes.Text(implicit_text))

Additional Information:

The issue seems to stem from both the MyST-Parser and docutils/html writers, where the URL is being escaped more than once. https://github.com/docutils/docutils/blob/b768e2626088711dec257b0847b563d02700a712/docutils/docutils/writers/_html_base.py

$ echo "http://a.com?foo=1&bar=2" | rst2html.py | grep foo
<p><a class="reference external" href="http://a.com?foo=1&amp;bar=2">http://a.com?foo=1&amp;bar=2</a></p>