MihaiValentin / lunr-languages

A collection of languages stemmers and stopwords for Lunr Javascript library
Other
431 stars 163 forks source link

don't rely upon generateStopWordFilter #36

Closed drzraf closed 7 years ago

drzraf commented 7 years ago

The usefulness and efficiency of generateStopWordFilter() is doubtful. Using Set(), the function is more efficient, more readable, and the stopwords can be tweaked afterwards (changing lunr.fr.stopWords) It make the file less dependant of lunr implementation details and could be used with elasticlunr.

olivernn commented 7 years ago

I'd like to see some benchmarks comparing the performance of checking for presence in the set with that of an object. Also, remember, that browser support for Set is not universal.

What is the use case for changing stop words that is not provided by the createStopWordFilter function?

It make the file less dependant of lunr implementation details and could be used with elasticlunr.

I don't see this as an issue that should be solved by lunr-languages, it is after all designed to be used with lunr. If elasticlunr wants to make use of lunr plugins then it should expose the same interface as lunr.

drzraf commented 7 years ago

If I use generateStopWordFilter, I "include" that file which forcefully registerFunctions what

Since I manually include lunr.fr.js (to use with elasticlunr.js) I'd rather get rid of stop_word_filter.js and its 5 useful lines (weixsong/elasticlunr.js#57, olivernn/lunr.js#190) (which actually could be a oneliner: return ! token || lunr.fr.stopWords.has(token) ? '' : token; )

I've no problem with copy/pasting the current generateStopWordFilter() code as-is inside lunr.fr.js either. Mainly, I'd like to use elasticlunr.js + lunr.fr.js without the problematic dependency of stop_word_filter.js

(putting apart the horrible issue of UTF-8 languages #12)

olivernn commented 7 years ago

You can of course fork this library and make it more specific to elasticlunr...

I think the issue is that you are trying to replace a dependency (lunr) with another dependency (elasticlunr) and the other dependency does not provide the same interface.

Perhaps elasticlunr (or your code) could provide a generateStopWordFilter function, then you are free to implement it how you wish, using a Set for example. The you are not dependent on lunr at all?

There may be more interfaces required by lunr-languages that are not implemented by elasticlunr, I don't know. But it would seem that elasticlunr should implement these interfaces if it wants to make use of lunr plugins.

drzraf commented 7 years ago

"interface"?... it's about choosing between keeping it simple or making it a framework with a possibly complex dependency set.

Perhaps elasticlunr (or your code) could provide a generateStopWordFilter function

olivernn commented 7 years ago

Okay, lets take a step back.

lunr-languages is a plugin for Lunr, as such it has dependencies on certain interfaces or APIs that Lunr provides, at a glance these are the interfaces it depends on:

All of the above are part of the public API of Lunr, and as such have well a defined interface.

If you need to use this Lunr plugin with something other than Lunr, it seems fair that that something needs to expose the same APIs.

So, to use this with, say elasticlunr, would require elasticlunr to implement those APIs. It appears that elasticlunr already implements some, but not all, of the required APIs.

Lunr provides the lunr.generateStopWordFilter for exactly the use case needed by lunr-languages, to create a custom stop word filter without having to re-implement how words are matched in a list of stop words.

If you want to get this to work with elasticlunr then elasticlunr must provide the required APIs. If that project is unwilling to add those APIs then your application will have to provide some kind of adaptor.

It seems unreasonable to expect this library to take on additional complexity in order to make up for some missing APIs in elasticlunr. In the same way that it would be unreasonable to expect it to work with any other JavaScript search library.

drzraf commented 7 years ago

I looked into both codebases attempting to merge both side... Changes from weixsong (to avoid storing useless values in the Object + avoiding an anonymous function) makes any change difficult to implement (adding to the namespace confusion over this) I give up, it's ridiculous to the point that it's more simple for me to reference/use my own version of lunr-languages/lunr.fr.js than intend to unify that javascript file.

Why stop_word_filter.js [somehow positive] changes couldn't be merged inside lurn.js in the first place and then vice-versa. That's the whole point of github. What means this level of fragmentation for something so simple? Was it that hard to create pull-requests? @weixsong (or otherwise rapidly merge cool changes about these https://github.com/weixsong/elasticlunr.js#key-features-comparing-with-lunrjs?) NB: boolean mode and query-time boosting are SolR features and no-way near elasticsearch specific

FYI At first I initially got interested by **lunr.js` for its language handling ability. But the contribution barrier and the long-term worry that implies such a level of fragmentation start to overcome the benefit. No rant against no-one but seeing this from outside it's deceptive.

Keep up the good merges

olivernn commented 7 years ago

If you are mostly interested in the language support, then Lunr has full support for lunr-languages ;)

I don't know what made you originally choose elasticlunr over Lunr in the first place, but maybe take a look at Lunr to see if it satisfies your requirements.

I have no control over what happens in the elasticlunr project, so using its fragmentation and lack of support as an argument against Lunr seems a little unfair.

Good luck with whichever option you choose! 😄