LeaVerou / brep

Write batch find & replace scripts that transform files with a simple human-readable syntax
13 stars 0 forks source link

`whole_word` flag #12

Closed LeaVerou closed 3 months ago

LeaVerou commented 3 months ago

To truly emulate text editors' find & replace, we also need a whole_word flag. However, this is not as trivial as wrapping the regex with \b(?: ... )\b. Word boundaries (\b) detect transitions from \w to \W (and vice versa). However when the match already starts or ends with a non-word character, it’s already a "whole word" match.

We probably need some lookarounds instead:

Furthermore, \W is not unicode aware and treats any non latin letter as a non-word character. For a unicode aware version, I think we need [^_\p{L}\p{N}]

LeaVerou commented 3 months ago

Actually, even these are not sufficient: if the match ends in non-word characters, it should match anyway, even if surrounded by word characters. E.g. in a<foo>b if the match is <foo> it’s already a "whole word".

So maybe it should be (?:(?=\W)|(?<=^|\W)) and (?:(?=$|\W)|(?<=\W)) instead. 😵‍💫

I wonder if we could avoid combining lookbehinds and lookaheads...