Closed brycewray closed 2 years ago
Oh my, I just had a dig and found the issue.
Everything is fine on the indexing side, Régis
is indexed correctly. To normalize content Pagefind is replacing special characters using a [^\\w]
regex.
The issue is that on the indexing side we use Rust's regex engine, which is good and counts é
as a "word character". In the browser though we normalize the search via the js regex engine, which is bad and does not count é
as a word character and thus removes it before searching. So it always forcibly searches for Rgis
.
I'll write up some tests and aim to get a release out later today that addresses this.
I'm going to be addressing this more concretely in the coming weeks while implementing fully-fledged multilingual support, so for now I've added in a quick patch that should cover most cases by only normalizing common keyboard punctuation on the browser end of things. That should go out as a 0.5.3
release in about 20 minutes
It appears that words with accented letters aren’t indexed. I searched for “Régis” as both “Régis” and “Regis” — neither returned “Régis.” Same for “Bjørn” — neither “Bjørn” nor “Bjorn” would return “Bjørn.” Just in case it was simply ignoring the accented letters, I also tried “Rgis” and “Bjrn” (respectively), and neither worked.