MihaiValentin / lunr-languages

A collection of languages stemmers and stopwords for Lunr Javascript library
Other
431 stars 163 forks source link

Use stopwords from multiple languages #1

Closed MihaiValentin closed 9 years ago

MihaiValentin commented 10 years ago

Someone asked me by email this question, so I'll add it here perhaps this may help others as well:

Is it possible to use multiple languages for stopwords with your lunr extension? In my case german, french and english in nodejs, like this:

global.lunr = require('./lib/lunr.js');
require('./lunr.stemmer.support.js');
require('./lunr.de.js');
require('./lunr.fr.js');
var idx = lunr(function () {
    this.use(lunr.de);
    this.use(lunr.fr);
    this.field('title', { boost: 10 })
    this.field('body')
});
MihaiValentin commented 10 years ago

Using stop words from multiple languages may not be a good idea. Here's a few reasons:

I would say there is no nice & easy solution for this.

However, you can add the german stopwords to the english ones at the runtime. You will have to require both english and german files, however you will not have to use the german one at all. To do this, use the following code:

global.lunr = require('./lib/lunr.js');
require('./lunr.stemmer.support.js');
require('./lunr.de.js');
require('./lunr.fr.js');
// add german stopwords to the english ones
for(var i = 0; i < lunr.de.stopWordFilter.stopWords.length; i++ ) {
    lunr.stopWordFilter.stopWords.add(lunr.de.stopWordFilter.stopWords.elements[i]);
}
// you can do the same for any other languages, but use lunr.<languagecode>.stopWordFilter

var idx = lunr(function () {
    // don't use any language extension, as the default english lunr is already populated with the stopwords
    this.field('title', { boost: 10 })
    this.field('body')
});