Python-Markdown / markdown

A Python implementation of John Gruber’s Markdown with Extension support.
https://python-markdown.github.io/
BSD 3-Clause "New" or "Revised" License
3.79k stars 863 forks source link

SmartyPants: Apostrophes at the start of leading contractions #1305

Closed feasgal closed 1 year ago

feasgal commented 1 year ago

As mentioned in the original project here, "SmartyPants will turn the apostrophe into an opening single-quote, when in fact it should be a closing one."

This is specifically presenting an issue for me when smarty collides with abbr. For instance, this code:

Each section of the editor controls an access group's ability to view, and actions within, the listed pages.
Select the checkboxes to allow the appropriate actions.

*[access group]: Access groups allow admin users to<br>assign custom permissions to all<br>users assigned to that group.

is rendering like this:

image

because when access group gets rendered by abbr, the ' is no longer recognized as being mid-word and instead renders as a leading single quote for the possessive s.

I've implemented a stopgap via javascript, but there are a lot of different contraction situations and it would be great to have a real fix.

mitya57 commented 1 year ago

Yes, unfortunately smarty does not play well with any markup. I would suggest to include everything, including 's, into abbr, that should help.

facelessuser commented 1 year ago

Oh, I had not originally noticed that the contraction was split across multiple HTML elements when this was brought up in the MkDocs Material issues. Yes, in order for smarty to deduce that a quote is in a contraction, it must be contained within the same element. It's just the way the Python Markdown parser works.

waylan commented 1 year ago

I was curious what the reference implementation did so I checked with the filter set to "SmartyPants". The following input:

an <abbr title="...">access group</abbr>'s ability

Results in the correct output:

an <abbr title="...">access group</abbr>’s ability

So this is clearly a bug. However, this is a limitation of building SmartyPants into the Markdown parser directly. The text content of each element is processed separately and we are not always able to take into consideration the larger context.

It occurs to me that perhaps we could special case a quote at the start of a "tail" of an inline element. Although, I don't recall off-hand if all of that info is available or not. But, more importantly, I wonder if this would introduce errors in other edge cases.

waylan commented 1 year ago

So I got curios and checked what the reference implementation does with this:

an <abbr title="...">access group </abbr>'s ability

Note that I added a space as the last character of the content of the abbr tag. One would expect that this would cause the quote to not get curled, or to get curled to the right. However, it is curled to the left just like the previous example. This suggests that the reference implementation is not very smart about this scenario and assumes that the first character of the tail of an inline element does not have whitespace to the left of it. That seems like it is perhaps an assumption we could make as well.

ferdnyc commented 1 year ago

@waylan

This suggests that the reference implementation is not very smart about this scenario and assumes that the first character of the tail of an inline element does not have whitespace to the left of it. That seems like it is perhaps an assumption we could make as well.

I think actually it's (at least trying) to be particularly smart about 's, specifically. Because, while you get the correct apostrophe from all of these:

That's the wrong apostrophe
<b>That</b>'s the wrong apostrophe
That<b>'s</b> the wrong apostrophe

And both of these:

I <b>don't</b> want wrong apostrophes
<i>That'd</i> be a good apostrophe

all of these result in a wrong-way apostrophe:

I <b>don</b>'t want wrong apostrophes
I <b>don </b>'t want wrong apostrophes
<i>That</i>'d be a good apostrophe
That<i>'d</i> be a good apostrophe

It seems 's at the start of an element has been special-cased. (Which probably makes sense, even if it creates inconsistency. 's cases are crazy common, and I can't imagine a situation where you don't want to end up with a close-quote apostrophe even in <tag>foo</tag>'s or <tag>foo </tag>'s cases.)

Edit: (In fact, that special-casing is mentioned as having "always" been there, in the SmartyPants 1.3 blurb from its version history.)

waylan commented 1 year ago

@ferdnyc thanks for taking a closer look as this. The link to the reference implementation's change log is especially helpful. Apparently, it has always has a special case for 's. It's interesting that 't and 'd do not behave the same way, which verifies that 's is special cased. Seems reasonable that we should include the same special case. A PR is welcome.

mitya57 commented 1 year ago

I have created PR #1348 which fixes this issue. I hope I didn't break any other cases, but ideas what to add to tests are welcome.