MihaiValentin / lunr-languages

A collection of languages stemmers and stopwords for Lunr Javascript library
Other
431 stars 163 forks source link

Multi-language indexing #16

Closed leonid-shevtsov closed 8 years ago

leonid-shevtsov commented 9 years ago

I found a nice way to generalize the multi-language approach described in my blog post. This makes it possible to effectively index content in any number of supported languages. See demos/demo-multi.html for an example.

The code does just what is described in the post - it generates a custom trimmer that trims around characters of both languages, and then combines the stemmers and stopword filters into one pipeline. I wanted to generate a single stopword filter function instead of running them one by one, but in Lunr 0.6.0 the stopwords are not exposed so it's not possible.

This pull request depends on #15, because it needs the word character sets for every language. And I've refactored trimmer generation into a separate function, which I've put into lunr.stemmer.support.js just to avoid creating another support file (although technically it's has no relation to stemmers) - with the added bonus of cutting some bytes off the language files.

pyoner commented 8 years ago

I need this feature +1