github / cmark-gfm

GitHub's fork of cmark, a CommonMark parsing and rendering library and program in C
Other
875 stars 171 forks source link

Strip invisible characters from anchors #351

Open flanakin opened 8 months ago

flanakin commented 8 months ago

Proposal

When you create a header markdown header, an anchor is created for you and special characters are removed and spaces are trimmed. This works great... mostly. There are 2 issues:

  1. When an emoji is used, there are sometimes invisible characters left behind.
  2. When there's a space before or after the emoji, the space isn't getting trimmed.

Proposal

Remove special and invisible characters first, then trim spaces.

Example

## 🙋‍♀️ Ask a question

To the naked eye, this looks like #-ask-a-question, which is mostly fine barring the extra space. But when you see this in the browser, it's rendered as #%EF%B8%8F-ask-a-question.

This should render as #ask-a-question without the invisible characters or extra space.

waldyrious commented 8 months ago

Remove special and invisible characters first

I think replacing such characters with hyphens might be more predictable / less surprising. So #-ask-a-question is the result I feel would be most intuitive. (That said, I agree that removing them altogether is an improvement over the current situation.)

wooorm commented 8 months ago

(I don’t work at GH)