executablebooks / markdown-it-py

Markdown parser, done right. 100% CommonMark support, extensions, syntax plugins & high speed. Now in Python!
https://markdown-it-py.readthedocs.io
MIT License
662 stars 68 forks source link

gfm-like is too greedy when it comes to links #316

Open nschloe opened 7 months ago

nschloe commented 7 months ago

Describe the bug

MWE:

from markdown_it import MarkdownIt
from markdown_it.tree import SyntaxTreeNode

md = MarkdownIt("gfm-like")
tokens = md.parse("74.78.Fk")

print(SyntaxTreeNode(tokens).pretty())

Output:

<root>
  <paragraph>
    <inline>
      <link href='http://74.78.Fk'>
        <text>

Problem: "74.78.Fk" is not a link.

This problem doesn't occur with md = MarkdownIt()

Reproduce the bug

See above.

List your environment

markdown-it --version
markdown-it-py [version 3.0.0]

This is on Python 3.11.

Tip for devs: Add a --show-env parameter to markdown-it which shows all relevant packages and versions in one go.

welcome[bot] commented 7 months ago

Thanks for opening your first issue here! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out EBP's Code of Conduct. Also, please try to follow the issue template as it helps other community members to contribute more effectively.
If your issue is a feature request, others may react to it, to raise its prominence (see Feature Voting).
Welcome to the EBP community! :tada:

chrisjsewell commented 7 months ago

Heya, yeh for this and #317 I would note that the parsing is using https://github.com/tsutsu3/linkify-it-py, and perhaps bugs that should be reported there, but also maybe just a difference with the parsing logic of GFM.

In https://github.com/markdown-it-rust/markdown-it-plugins.rs, which is then utilised in https://github.com/chrisjsewell/markdown-it-pyrs, I actually implemented parsing logic "faithful" to GFM. So, if someone wants to port that here 😅

tsutsu3 commented 6 months ago

linkify-it-py behaves the same as linkify-it as per the specification. It is not a bug that it is linked.

.fk is the country code top-level domain (ccTLD) for the Falkland Islands. Therefore, it is automatically linked. If you do not want it to be linked, escape it as 74.78\.Fk.

See markdown-it demo.