Currently stop words are treated differently than contractions and abbreviations: as soon as one specifies filter_languages, the stop words of these languages will always be used, even if providing custom stop words. I believe should not be the case, a fix would only require removal of the latter condition in tokenizer.rb#L129. (probably some tests need to be altered too)
Currently stop words are treated differently than contractions and abbreviations: as soon as one specifies
filter_languages
, the stop words of these languages will always be used, even if providing custom stop words. I believe should not be the case, a fix would only require removal of the latter condition in tokenizer.rb#L129. (probably some tests need to be altered too)