I found a nice way to generalize the multi-language approach described in my blog post. This makes it possible to effectively index content in any number of supported languages. See demos/demo-multi.html for an example.
The code does just what is described in the post - it generates a custom trimmer that trims around characters of both languages, and then combines the stemmers and stopword filters into one pipeline. I wanted to generate a single stopword filter function instead of running them one by one, but in Lunr 0.6.0 the stopwords are not exposed so it's not possible.
This pull request depends on #15, because it needs the word character sets for every language. And I've refactored trimmer generation into a separate function, which I've put into lunr.stemmer.support.js just to avoid creating another support file (although technically it's has no relation to stemmers) - with the added bonus of cutting some bytes off the language files.
I found a nice way to generalize the multi-language approach described in my blog post. This makes it possible to effectively index content in any number of supported languages. See
demos/demo-multi.html
for an example.The code does just what is described in the post - it generates a custom trimmer that trims around characters of both languages, and then combines the stemmers and stopword filters into one pipeline. I wanted to generate a single stopword filter function instead of running them one by one, but in Lunr 0.6.0 the stopwords are not exposed so it's not possible.
This pull request depends on #15, because it needs the word character sets for every language. And I've refactored trimmer generation into a separate function, which I've put into
lunr.stemmer.support.js
just to avoid creating another support file (although technically it's has no relation to stemmers) - with the added bonus of cutting some bytes off the language files.