erusev / parsedown

Better Markdown Parser in PHP
https://parsedown.org
MIT License
14.69k stars 1.12k forks source link

Create ID's for Header elements so they can be referenced in anchor tags #767

Closed aidantwoods closed 4 years ago

aidantwoods commented 4 years ago

Credit for these changes goes to the work done in #765 by @netniV and @Ayesh.

This PR rephrases the changes made there toward the 2.0.x branch.

Ayesh commented 4 years ago

Thank you, this is nice and looking forward to the 2.x releases. I also think it's better to make the slug function configurable. This library ideally should come from a DI container, and it makes sense that we make the slug function customizable with a sane default.

Ayesh commented 4 years ago

If I'd suggest one thing if I may, I think the regex would be better with p{Nd}\p{Nl} instead of p{N}.

\p{N} includes superscript numbers and divisions, which ideally should not make to a URL, and neither does in GFM.

So instead of $slug = \preg_replace('/[^\p{L}\p{N}\p{M}-]+/u', '', $slug);, we could use $slug = \preg_replace('/[^\p{L}\p{Nd}\p{Nl}\p{M}-]+/u', '', $slug);. This converts 测试 x² 标题 to 测试-x-标题 (notice the ² gone in slug).

aidantwoods commented 4 years ago

So instead of $slug = \preg_replace('/[^\p{L}\p{N}\p{M}-]+/u', '', $slug);, we could use $slug = \preg_replace('/[^\p{L}\p{Nd}\p{Nl}\p{M}-]+/u', '', $slug);.

Done, 99dd44e :)

aidantwoods commented 4 years ago

@netniV RE your comment

Looks good. I added one comment as per our discussions here over str_replace. More involved is your change but I was thinking it should be an optional addition as it could affect other elements due to a duplication of ID

The only other thought is with trimming a trailing and leading hyphen as that seems wrong to keep those to me.

Trimming the leading and trailing hyphens seems sensible, I think I can also deal with the ID duplication too—GitHub seems to resolve this by appending -n to duplicated IDs (where n is an increasing counter per duplication), and I think this is a sensible approach to use. I'll probably generalise it a bit so that custom handling of de-duplication is possible (e.g. maybe you want to use a prefix, or a different separator).

netniV commented 4 years ago

It’s amazing how few lines of code can balloon ;)

Work is looking good.

aidantwoods commented 4 years ago

It’s amazing how few lines of code can balloon ;)

Indeed 😉

I moved some of the irrelevant test fixes out of this PR and have now rebased, so it is looking a little more focused now. Still at ~350 additions though :)

Fortunately for anyone using these changes, they'll be able to modify slug behaviours (if they wish) with essentially a one-liner—which is quite nice :)

aidantwoods commented 4 years ago

I moved the MutableConfigurable changes into a separate PR (#768) and have rebased this a final time (so now only the changes directly related to adding the slug are in this PR :) )