facelessuser / pymdown-extensions

Extensions for Python Markdown
https://facelessuser.github.io/pymdown-extensions/
Other
956 stars 254 forks source link

MagicLink: Should not hijact explicit links like `[@decorator][root.sub.decorator]` #2502

Open pekkaklarck opened 2 days ago

pekkaklarck commented 2 days ago

Description

I'm using mkdocstrings for API doc generation and referred to a decorator using syntax like [@decorator]⁠[root.sub.decorator]. That didn't work, because MagicLink considered it a mention and the link target ended up being https://github.com/decorator.

I would expect MagicLink to only look for mentions, issue references, etc. only from normal text and possibly from link targets, not from links titles.

Minimal Reproduction

See the description.

Version(s) & System Info

facelessuser commented 2 days ago

I view this less as a bug in our extension and more as a weakness of the Python Mardown parser in general.

Markdown parsers are implemented in various ways, and some lend better to solving issues like this. In Python Markdown, this is unfortunately the way the parser works. When using an inline processor, the match does not have or know what the parent will be. I would advise escaping the @ symbol via \@.

I'm sure it might be possible to completely rewrite this extension, waiting for the treeprocessor step (where we have parental lineage), and then searching all the nodes for mention syntax (and others), skipping any that we can now deduce as having a link as a parent. But waiting until that step may introduce other issues where other plugins get the syntax first preventing us from transforming the appropriate text, not to mention it would be a lot of work to rewrite everything.

I will at least take another look and consider if there is a simple, easy way to catch this that I had not considered previously, but if a simple solution cannot be found, I will close this as a "won't fix".

facelessuser commented 2 days ago

As an example, this issue exists in Python Markdown without other extensions. Consider this example. Two different link styles nested in each other.

[<https://google.com>](#id2)

It gets transformed into this.

<p><a href="#id2"><a href="https://google.com">https://google.com</a></a></p>

This simply illustrates the weakness in Python Markdown that we are being asked to work around, which we likely will not fix due to the complexity of doing so within the parser.

facelessuser commented 2 days ago

As a note, this is a duplicate of #2223 and #1074.

Looking things over, the only options are to make Magiclink handle all links and link related types (references included) in one shot so it can sort out all confusion. This is fragile as there is likely other possible ways in which people can add extensions to create links and I would view this as very fragile.

The only other approach is to create the links as we are now, creating a stash of their original value and reverting them if they are ever found to be nested under a parent link. I'd have to consider how much work this would be, but it is a possibility. The content though would be taken out of the parser pipeline and not get processed by any other inline parsers still.

I will consider this not a bug, but an enhancement. This behavior has been known and understood for a long time is is currently expected due to how Python Markdown works. Escaping @ is the correct solution at the current time.

I will mark this as an enhancement to improve behavior, assuming we can do it reliably and without a ton of complexity.

pekkaklarck commented 2 days ago

I agree this isn't too severe and not worth fixing if that's too complicated. Good to know that escaping with \ works.