Closed ahmed-musallam closed 1 month ago
FYI, the old regex (from 2023) was
const REGEXP_ICON = /(?<!(?:https?|urn)[^\s]*):(#?[a-z_-]+[a-z\d]*):/gi;
and the current one is
const REGEXP_ICON = /(?<!https?:\/\/[^\s]*):(#?[a-z_\d-]+):(?!\w)/gi;
which I think has multiple problems.
The first part in both is allowing an optional angle bracket. Why?
no, this is a negative lookbehind.
The second part in old is trying to exclude URIs using an expensive and incorrect glob.
no, just a cheap way of preventing url like strings
The second part in current is optionally excluding URIs that don't start with h.
same as above, preventing http(s):
The third part is supposed to be matching the icon identifier if it starts with #.
no. the # is optional. this was once and indicator for sprite sheets.
The last part of current allows a not-word character?
it's a negative look ahead. specifying that the pattern must not be followed by a word character. (this is what broke the site)
if you look at the tests, you see why the regexp is (and was) the way it is currently:
https://github.com/adobe/helix-html-pipeline/blob/main/test/fixtures/content/icons.md https://github.com/adobe/helix-html-pipeline/blob/main/test/fixtures/content/icons-ignored.md
anyways, since we reverted the change, we can close this issue for now.
Description After the fix in https://github.com/adobe/helix-html-pipeline/pull/704 what used to be considered icons in our site, is no longer considered icons.
See example page here: https://main--exlm-prod--adobe-experience-league.hlx.page/en/docs/experience-manager-cloud-service/content/overview/introduction
This is the source markup:
see that header is:
The issue seems to be the lack of space after the trailing
:
But this used to work before the change and now it does not. It impacts a large set of pages because we use that icon to indicate anchors - this is a separate topic entirely but we did depend on how the pipeline generated icons. this would also fail with other icons if there is a preceding space but not a trailing one..