Upgrade to support lunr 2.x

olivernn commented 7 years ago

This commit adds support for the upcoming release of lunr 2.x. This has not been released yet so its probably best off waiting to merge this until I do that release. Ideally I'd like to be able to have the same great language support in the new Lunr from day one though, so getting this out here now to get feedback.

Lunr has changed the interface for pipeline functions. Before, tokens were strings passed to pipeline functions, Lunr 2.x changes this, wrapping them in a lunr.Token object. This means that all pipeline functions that expect to be working with a string need to be updated to work with a lunr.Token.

This change covers most of the language plugins in this repository. The Japanese plugin required a few more changes, specifically to make use of the new, per index, tokeniser. This should allow a Japanese and non-Japanese indexes to coexist.

There is one potential issue though, searches are now parsed by lunr.QueryParser which expects terms to be whitespace separated. I don't know enough (none) Japanese to get from the demos if this is an issue or not, perhaps someone can lend a hand here.

I have not changed any of the versions etc, I don't know what you want to do here. I'd like to suggest still keeping support for the 0.x and 1.x branches of Lunr, as well as the new interfaces in Lunr 2.x. Perhaps lune-languages could have a similar versioning scheme, to indicate which major version of Lunr is supported. I'm open to ideas here though.

amsdamsgram commented 7 years ago

I've tried it with the french language and it works good. It would be nice if it could be merged.

MihaiValentin commented 7 years ago

Thanks @olivernn for this PR. It helped me understand the upcoming changes in Lunr 2.

So, I took the insight from here and overhauled Lunr Languages to be compatible with all Lunr versions (0.6.0, 0.7.0, 1.0.0, 2.0.0-alpha.5). The code is now in master and I bumped Lunr Languages to version 1.0.0

This will help users, since no matter what Lunr version they will use, they'll just to have to make sure they use the latest Lunr Languages version.

In order to enforce this, I added integration tests that test the combination between Lunr versions X Lunr Languages languages.

In this way, we'll achieve two things:

you will be able to quickly perform a smoke test when releasing new Lunr versions by just adding the newest version to the Lunr Languages tests (2 lines of code)
any contributor to Lunr Languages will know as early as possible if his changes/bug-fixes do not break any existing functionality, thus ensuring stability

I will close this MR now.

Here's the commit in which these changes were made: https://github.com/MihaiValentin/lunr-languages/commit/4c64ac618e5c89868c0755761cb6f510d0a74d91 . The key changes were made in:

lunr.template - forward-compatibility to Lunr 2
test/testdata/<languages>.js - testcases for all the languages
test/VersionsAndLanguagesTest.js - the test that tests all Lunr versions with all the languages testscases
lunr.jp.de (this is not generated from lunr.template) - support for the Japanese tokenizer across all Lunr versions
lunr.multi.js - the multi language support also required using the searchPipeline for correctly stemming the search terms in Lunr 2
lunr.stemmer.support.js - forward-compatibility to Lunr 2

Should you have any questions, please comment on the commit linked above, or let's talk in Gitter.

@iDams , you can now use it :).

olivernn commented 7 years ago

💯 Nice work @MihaiValentin!

MihaiValentin / lunr-languages

Upgrade to support lunr 2.x #30