PeWu / topola-viewer

Topola Genealogy Viewer – interactive genealogy visualization
https://pewu.github.io/topola-viewer
Apache License 2.0
221 stars 56 forks source link

Support non-latin characters in search index (#154) #159

Closed czifumasa closed 1 year ago

czifumasa commented 1 year ago

Fix for a #154.

Explanation:

Search implementation is based on lunr.js library. Example from #154 is in Hebrew. According to their docs Hebrew language is not supported: https://lunrjs.com/guides/language_support.html

Lunr works by first dividing all strings to tokens, then tokens are run through the pipeline functions. One of the functions is trimmer. Trimmer removes any non-word characters from a token. For not supported languages any non-latin character is trimmed. As a result שלום is trimmed to an empty string and cannot be used in search. To make it work for non-latin based languages, trimmer function must be removed from the pipeline (As described in lunr docs). It would be easy in regular lunr, by adding this line in lunr index initialization: this.pipeline.remove(lunr.trimmer);

But because topola is using lunr.multi extension, trimmer is dynamically generated based on the list of provided languages and cannot be passed to this.pipeline.remove. Instead, I recreated the logic of the lunr.multi extension in a custom function, omitting generation of the trimmer.