CloudCannon / pagefind

Static low-bandwidth search at scale
https://pagefind.app
MIT License
3.24k stars 99 forks source link

Alias accented characters #538

Open velldes opened 6 months ago

velldes commented 6 months ago

Please add possibility to treat different chars as same? For example 'Míša' should be searchable as 'Misa'. ě = e ů = u ř = r ґ = г ('Ґанок' should be searchable as 'Ганок') and so on

bglw commented 6 months ago

👋 @velldes — thanks for the issue, this is definitely an overdue feature.

Due to the way the index is constructed, this will likely have to be a setting at indexing time — i.e. a CLI flag for the pagefind binary that changes the way words are indexed sitewide.

Since it's a new setting, it will need to default off to not be a breaking change — and I can imagine use-cases where the current behavior is preferred.

Currently my leading contender for opting into this would be something like pagefind --site my_site --merge-diacritics. Perhaps with a catchier flag name.

How does that sound to you?

velldes commented 6 months ago

Sounds good. Thank you

ColeDCrawford commented 2 months ago

Any progress on this one?