bu-ist / Glossary

A simple WordPress plugin to help create glossary in order to improve SEO and time spent on website.
http://codeat.co/glossary
GNU General Public License v2.0
0 stars 0 forks source link

Exclude content within certain tags from being auto linked #8

Open desrosj opened 7 years ago

desrosj commented 7 years ago

On Shipley, there are some Glossary terms that occur within HTML heading tags that we do not want to be auto-linked.

Right now, we want to exclude anything within an <h1>-<h6> tag, and anything within a <table> tag.

There is a filter for this, but my regex skills are not strong enough to tackle this.

I discussed this briefly with @acketon, and he was wondering if searching for anything in a <p> would be easier than excluding certain tags in the regex.

acketon commented 7 years ago

if we were to exclude by certain tags, I can come up with a better list... it would be more than those outlined above. Open to other options... basically just need to prevent the autolinker from linking up any time it finds the term in some text. It's breaking things when linking text inside some HTML structure for a big callout or button, or just is inappropriately linking terms in header tags on a long page of content.

desrosj commented 7 years ago

When I tested it, the HTML breaking is fixed in the latest WordPress.org version of the plugin. Here is the regex for that plugin version for context. We can probably use that for our starting point.

return apply_filters( 'glossary-regex', '/(?<!\\w)((?i)' . preg_quote( $title ) . '(?-i))(?=[ \\.\\,\\:\\;\\*\\"\\)\\!\\?\\/\\%\\$\\€\\£\\|\\^\\<\\>])(?![^<]*(<\\/a>|<\\/span>|" \\/>|>))/', preg_quote( $title ) );

jdub233 commented 7 years ago

The final () clause in the regex is the exclusion clause, we can add additional tags. Here is a regex that should exclude h1-h6 and anything inside of a :

'/(?<!\w)((?i)' . preg_quote( $title ) . '(?-i))(?=[ \.\,\:\;\*\"\)\!\?\/\%\$\£\|\^\<\>])(?![^<]*(<\/a>|<\/span>|<\/td>|<\/h[1-6]>|" \/>|>))/'

Because of the structure of the regex, it only really works on directly enclosing tags, I think. Dakota, do you want send a more comprehensive list of tags?