Python-Markdown / markdown

A Python implementation of John Gruber’s Markdown with Extension support.
https://python-markdown.github.io/
BSD 3-Clause "New" or "Revised" License
3.79k stars 863 forks source link

Proposal: Move `hr` extension (or part of it) after list handling #1282

Closed facelessuser closed 1 year ago

facelessuser commented 2 years ago

Doing some prototyping of #1175, I ran into a particular issue. In the attempt to try and utilize a YAML-ish frontmatter header for general purpose block extensions, I noticed that hr are handled before lists.

It seems the motivation was to ensure that - - - would be processed as an hr tag and not a list, but this also exclude --- as well.

Before I lay out some potential proposals, I should state a few behavioral things about lists and hr.

  1. Even with hr disabled, lists do not allow --- list by default, as a list requires a space after the first -. This means that you can't create a list with a lone - or --- unless you add a trailing space after the first -.
  2. Even if other parsers allow - - - list, this is difficult to use in the real world, and I suspect simply on oversight as I can't see a real-world use of this. But if it is required just to be consistent with how the rules are laid out, I guess I can see why it behaves the way it does.

Proposals:

  1. Split HR into 2 rules: HR1 which specifically catches - - - cases before lists, and HR2 which catches --- cases after list processing. This would keep behavior identical as it is now.
  2. Put an exception case in UListProcessor.test that at least rejects the case if the block starts with ^[ ]{0,%d}([+*-]{3,})[ ]*(?:\n|$). Then just move HR completely.

    I'd like to not limit all cases where a single unordered list indicator can have only a single space after it as that is how we make SuperFences work:

    - 
    code
    ```

Of course, a suitable priority would have to be determined to minimize any potential breakages, maybe attaching the new hr immediately after UListProcessor? Maybe 29 or 29.9?

I realize the other potential option is that I can just override the HR rule myself when I register a said general purpose block. I also realize this issue could be moot if the aforementioned general purpose block doesn't use YAML-ish fences for its block frontmatter.

Figured I would open up a separate discussion here to discuss this particular issue.

waylan commented 2 years ago

2. Even if other parsers allow - - - list, this is difficult to use in the real world, and I suspect simply on oversight as I can't see a real-world use of this. But if it is required just to be consistent with how the rules are laid out, I guess I can see why it behaves the way it does.

Way back when, I had similar thoughts. However, the rules clearly state that spaces are allowed. And I seem to recall an old discussion on the Markdown-Discuss mailing list where people were showing off their various artistic patterns for horizontal rules (3, 2, 3, etc.). Unfortunately, I can't find it now. Personally, I agree, who is going to type that out when three hyphens are much easier to type, but apparently, at least in the early days, it was common. Therefore, we can't be changing the syntax.

I realize the other potential option is that I can just override the HR rule myself when I register a said general purpose block. I also realize this issue could be moot if the aforementioned general purpose block doesn't use YAML-ish fences for its block frontmatter.

Either of these would be my preferred solution (or course, a discussion of which can be held in #1175). Every time we mess with the core parser, we find that we break existing documents in weird unexpected ways. By including the altered behavior in an extension, we avoid that for the general case. I think it is reasonable to expect that if someone really wants to use an extension, then they will need to adjust their edge-case syntax to not conflict with the extension. However, when no extensions are being used, the existing behavior with all of its quirks should remain intact.

That said, sometimes there is no way to solve a problem from an extension alone. In that case, I am open to discussing options for altering the core so that an extension can work. However, it doesn't seem that that is the case here. If I'm missing something in that regard, please do point it out.

facelessuser commented 2 years ago

Way back when, I had similar thoughts. However, the rules clearly state that spaces are allowed. And I seem to recall an old discussion on the Markdown-Discuss mailing list where people were showing off their various artistic patterns for horizontal rules (3, 2, 3, etc.). Unfortunately, I can't find it now. Personally, I agree, who is going to type that out when three hyphens are much easier to type, but apparently, at least in the early days, it was common. Therefore, we can't be changing the syntax.

That's fair. The indentation rules of three lists like this aren't handled in a sane way IIRC, but that's fine. Most parsers turn these into lists, so keeping them is fine.

Either of these would be my preferred solution (or course, a discussion of which can be held in https://github.com/Python-Markdown/markdown/issues/1175). Every time we mess with the core parser, we find that we break existing documents in weird unexpected ways.

Yeah, I figured as much. The syntax isn't settled yet anyways, but I at least think I have a good understanding of preference if we need this.