Closed LeaVerou closed 3 months ago
Actually, even these are not sufficient: if the match ends in non-word characters, it should match anyway, even if surrounded by word characters. E.g. in a<foo>b
if the match is <foo>
it’s already a "whole word".
So maybe it should be (?:(?=\W)|(?<=^|\W))
and (?:(?=$|\W)|(?<=\W))
instead. 😵💫
I wonder if we could avoid combining lookbehinds and lookaheads...
To truly emulate text editors' find & replace, we also need a
whole_word
flag. However, this is not as trivial as wrapping the regex with\b(?:
...)\b
. Word boundaries (\b
) detect transitions from\w
to\W
(and vice versa). However when the match already starts or ends with a non-word character, it’s already a "whole word" match.We probably need some lookarounds instead:
(?<=^|\W)
= preceded by beginning of line/file OR non-word character(?=$|\W)
= followed by end of line/file OR non-word characterFurthermore,
\W
is not unicode aware and treats any non latin letter as a non-word character. For a unicode aware version, I think we need[^_\p{L}\p{N}]