executablebooks / markdown-it-py

Markdown parser, done right. 100% CommonMark support, extensions, syntax plugins & high speed. Now in Python!
https://markdown-it-py.readthedocs.io
MIT License
662 stars 68 forks source link

New linkify rule does not linkify URLs entirely when they have a preceding `text` token #300

Closed miteshashar closed 6 months ago

miteshashar commented 11 months ago

Describe the bug

context

https://github.com/executablebooks/markdown-it-py/compare/v2.2.0...v3.0.0#diff-06572a96a58dc510037d5efa622f9bec8519bc1beab13c9f251e97e657a9d4edR21-R23

When I provide this input:

http://example.org/foo._bar_-_baz This works

This doesnt http://example.org/foo._bar_-_baz

While this `does` http://example.org/foo._bar_-_baz, this doesnt http://example.org/foo._bar_-_baz and this **does** http://example.org/foo._bar_-_baz

This applies to _series of URLs too_ http://example.org/foo._bar_-_baz http://example.org/foo._bar_-_baz, these dont http://example.org/foo._bar_-_baz http://example.org/foo._bar_-_baz and these **do** http://example.org/foo._bar_-_baz http://example.org/foo._bar_-_baz

expectation I expect all URLs to get linkified correctly and entirely.

bug But instead the ones indicated in the provided input do not get linkified.

image

problem This is a problem for people using URLs in their content, because they expect consistency in the output.

Reproduce the bug

The provided input content suffices to indicate the use cases where the bug occurs.

List your environment

markdown-it-py==3.0.0 mdit-py-plugins==0.4.0

Python version: 3.11.4 OS: MacOS

Markdown parser config:

welcome[bot] commented 11 months ago

Thanks for opening your first issue here! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out EBP's Code of Conduct. Also, please try to follow the issue template as it helps other community members to contribute more effectively.
If your issue is a feature request, others may react to it, to raise its prominence (see Feature Voting).
Welcome to the EBP community! :tada:

tsutsu3 commented 6 months ago

There seems to be a problem with markdown-it-py. There were no bugs in linkify-it-py. (It gave the same results as linkify-it.).

A comparison of linkify-it and linkify-it-py, and a comparison of markdown-it and markdown-it-py. result py:

# md render
<p><a href="http://example.org/foo._bar_-_baz">http://example.org/foo._bar_-_baz</a> This works</p>

<p>This doesnt <a href="http://example.org/foo">http://example.org/foo</a>.<em>bar</em>-_baz</p>

<p>While this <code>does</code> <a href="http://example.org/foo._bar_-_baz">http://example.org/foo._bar_-_baz</a>, this doesnt <a href="http://example.org/foo">http://example.org/foo</a>.<em>bar</em>-_baz and this <strong>does</strong> <a href="http://example.org/foo._bar_-_baz">http://example.org/foo._bar_-_baz</a></p>

<p>This applies to <em>series of URLs too</em> <a href="http://example.org/foo._bar_-_baz">http://example.org/foo._bar_-_baz</a> <a href="http://example.org/foo._bar_-_baz">http://example.org/foo._bar_-_baz</a>, these dont <a href="http://example.org/foo">http://example.org/foo</a>.<em>bar</em>-_baz <a href="http://example.org/foo">http://example.org/foo</a>.<em>bar</em>-_baz and these <strong>do</strong> <a href="http://example.org/foo._bar_-_baz">http://example.org/foo._bar_-_baz</a> <a href="http://example.org/foo._bar_-_baz">http://example.org/foo._bar_-_baz</a></p>

# linkify match
[linkify_it.main.Match({'schema': 'http:', 'index': 0, 'last_index': 33, 'raw': 'http://example.org/foo._bar_-_baz', 'text': 'http://example.org/foo._bar_-_baz', 'url': 'http://example.org/foo._bar_-_baz'})]
[linkify_it.main.Match({'schema': 'http:', 'index': 12, 'last_index': 45, 'raw': 'http://example.org/foo._bar_-_baz', 'text': 'http://example.org/foo._bar_-_baz', 'url': 'http://example.org/foo._bar_-_baz'})]
[linkify_it.main.Match({'schema': 'http:', 'index': 18, 'last_index': 51, 'raw': 'http://example.org/foo._bar_-_baz', 'text': 'http://example.org/foo._bar_-_baz', 'url': 'http://example.org/foo._bar_-_baz'}), linkify_it.main.Match({'schema': 'http:', 'index': 65, 'last_index': 98, 'raw': 'http://example.org/foo._bar_-_baz', 'text': 'http://example.org/foo._bar_-_baz', 'url': 'http://example.org/foo._bar_-_baz'}), linkify_it.main.Match({'schema': 'http:', 'index': 117, 'last_index': 150, 'raw': 'http://example.org/foo._bar_-_baz', 'text': 'http://example.org/foo._bar_-_baz', 'url': 'http://example.org/foo._bar_-_baz'})]
[linkify_it.main.Match({'schema': 'http:', 'index': 37, 'last_index': 70, 'raw': 'http://example.org/foo._bar_-_baz', 'text': 'http://example.org/foo._bar_-_baz', 'url': 'http://example.org/foo._bar_-_baz'}), linkify_it.main.Match({'schema': 'http:', 'index': 71, 'last_index': 104, 'raw': 'http://example.org/foo._bar_-_baz', 'text': 'http://example.org/foo._bar_-_baz', 'url': 'http://example.org/foo._bar_-_baz'}), linkify_it.main.Match({'schema': 'http:', 'index': 117, 'last_index': 150, 'raw': 'http://example.org/foo._bar_-_baz', 'text': 'http://example.org/foo._bar_-_baz', 'url': 'http://example.org/foo._bar_-_baz'}), linkify_it.main.Match({'schema': 'http:', 'index': 151, 'last_index': 184, 'raw': 'http://example.org/foo._bar_-_baz', 'text': 'http://example.org/foo._bar_-_baz', 'url': 'http://example.org/foo._bar_-_baz'}), linkify_it.main.Match({'schema': 'http:', 'index': 202, 'last_index': 235, 'raw': 'http://example.org/foo._bar_-_baz', 'text': 'http://example.org/foo._bar_-_baz', 'url': 'http://example.org/foo._bar_-_baz'}), linkify_it.main.Match({'schema': 'http:', 'index': 236, 'last_index': 269, 'raw': 'http://example.org/foo._bar_-_baz', 'text': 'http://example.org/foo._bar_-_baz', 'url': 'http://example.org/foo._bar_-_baz'})]

result js:

# md render
<p><a href="http://example.org/foo._bar_-_baz">http://example.org/foo._bar_-_baz</a> This works</p>

<p>This doesnt <a href="http://example.org/foo._bar_-_baz">http://example.org/foo._bar_-_baz</a></p>

<p>While this <code>does</code> <a href="http://example.org/foo._bar_-_baz">http://example.org/foo._bar_-_baz</a>, this doesnt <a href="http://example.org/foo._bar_-_baz">http://example.org/foo._bar_-_baz</a> and this <strong>does</strong> <a href="http://example.org/foo._bar_-_baz">http://example.org/foo._bar_-_baz</a></p>

<p>This applies to <em>series of URLs too</em> <a href="http://example.org/foo._bar_-_baz">http://example.org/foo._bar_-_baz</a> <a href="http://example.org/foo._bar_-_baz">http://example.org/foo._bar_-_baz</a>, these dont <a href="http://example.org/foo._bar_-_baz">http://example.org/foo._bar_-_baz</a> <a href="http://example.org/foo._bar_-_baz">http://example.org/foo._bar_-_baz</a> and these <strong>do</strong> <a href="http://example.org/foo._bar_-_baz">http://example.org/foo._bar_-_baz</a> <a href="http://example.org/foo._bar_-_baz">http://example.org/foo._bar_-_baz</a></p>

# linkify match
[
  Match {
    schema: 'http:',
    index: 0,
    lastIndex: 33,
    raw: 'http://example.org/foo._bar_-_baz',
    text: 'http://example.org/foo._bar_-_baz',
    url: 'http://example.org/foo._bar_-_baz'
  }
]
[
  Match {
    schema: 'http:',
    index: 12,
    lastIndex: 45,
    raw: 'http://example.org/foo._bar_-_baz',
    text: 'http://example.org/foo._bar_-_baz',
    url: 'http://example.org/foo._bar_-_baz'
  }
]
[
  Match {
    schema: 'http:',
    index: 18,
    lastIndex: 51,
    raw: 'http://example.org/foo._bar_-_baz',
    text: 'http://example.org/foo._bar_-_baz',
    url: 'http://example.org/foo._bar_-_baz'
  },
  Match {
    schema: 'http:',
    index: 65,
    lastIndex: 98,
    raw: 'http://example.org/foo._bar_-_baz',
    text: 'http://example.org/foo._bar_-_baz',
    url: 'http://example.org/foo._bar_-_baz'
  },
  Match {
    schema: 'http:',
    index: 117,
    lastIndex: 150,
    raw: 'http://example.org/foo._bar_-_baz',
    text: 'http://example.org/foo._bar_-_baz',
    url: 'http://example.org/foo._bar_-_baz'
  }
]
[
  Match {
    schema: 'http:',
    index: 37,
    lastIndex: 70,
    raw: 'http://example.org/foo._bar_-_baz',
    text: 'http://example.org/foo._bar_-_baz',
    url: 'http://example.org/foo._bar_-_baz'
  },
  Match {
    schema: 'http:',
    index: 71,
    lastIndex: 104,
    raw: 'http://example.org/foo._bar_-_baz',
    text: 'http://example.org/foo._bar_-_baz',
    url: 'http://example.org/foo._bar_-_baz'
  },
  Match {
    schema: 'http:',
    index: 117,
    lastIndex: 150,
    raw: 'http://example.org/foo._bar_-_baz',
    text: 'http://example.org/foo._bar_-_baz',
    url: 'http://example.org/foo._bar_-_baz'
  },
  Match {
    schema: 'http:',
    index: 151,
    lastIndex: 184,
    raw: 'http://example.org/foo._bar_-_baz',
    text: 'http://example.org/foo._bar_-_baz',
    url: 'http://example.org/foo._bar_-_baz'
  },
  Match {
    schema: 'http:',
    index: 202,
    lastIndex: 235,
    raw: 'http://example.org/foo._bar_-_baz',
    text: 'http://example.org/foo._bar_-_baz',
    url: 'http://example.org/foo._bar_-_baz'
  },
  Match {
    schema: 'http:',
    index: 236,
    lastIndex: 269,
    raw: 'http://example.org/foo._bar_-_baz',
    text: 'http://example.org/foo._bar_-_baz',
    url: 'http://example.org/foo._bar_-_baz'
  }
]
tsutsu3 commented 6 months ago

ref markdown-it demo