MihaiValentin / lunr-languages

A collection of languages stemmers and stopwords for Lunr Javascript library
Other
432 stars 162 forks source link

Add chinese support #53

Closed repairearth closed 3 years ago

repairearth commented 5 years ago

What i did

Why i did

I looked for the chinese support for lunr, but there was none, so i built it.

It can only run on node side, for browser side, you must first create a serialised index with JSON and load it via

lunr.Index.load(JSON.parse(data))

Here is an example https://github.com/humanseelabs/gatsby-plugin-lunr/blob/master/src/gatsby-browser.js

I use nodejieba as the chinese tokenizer, and will not want to support other chinese tokenizers, i think nodejieba is good enough.

dadiorchen commented 5 years ago

Great! I want to use lunr index Chinese text too, How about the size of the words library?

repairearth commented 5 years ago

@dadiorchen Please refer to nodejieba for the detail.

mapleeit commented 5 years ago

@repairearth Do you have time to solve the conflict so that the owner could merge this? 💐

repairearth commented 5 years ago

@mapleeit done.

mapleeit commented 5 years ago

Hi, @MihaiValentin Please take a look at this PR when you have time. Thanks!

rapon233 commented 4 years ago

感谢!希望能正式实现! Great job!

sunzongzheng commented 4 years ago

@repairearth Did you test multiLanguage? I use your repo now. When use single language,it works well. But when use multiLanguage, the search result looks like no word segmentation. I have to search exactly word. image

biosocket commented 4 years ago

@repairearth thank you so much for your effort :)

When loading a serialized index as you described, it works, but the segmenter is not loaded. Is there a way to load the segmenter for searching on the browser side?

For example, given the phrase "他们扭头一看" A search for "他们" returns a result. A search for "他们扭头一看" returns no result because the segementer is not loaded and the search phrase is not separated into pieces.

rxliuli commented 4 years ago

Is there any progress in this function?

futurist commented 4 years ago

+1 for this PR. Hope it's resolved and merged!

linhandev commented 4 years ago

Hope Chinese support can be added soon

LucyGwilliamAdmin commented 4 years ago

@repairearth @MihaiValentin do you know when this might be merged?

Thanks both

francis-du commented 4 years ago

@repairearth

Hi , Felix:

I think this repo is no longer maintained,.

Can you checkout to this branch ?

xhemj commented 4 years ago

终于看到中文的了!希望赶紧合并 Good!!

su9257 commented 3 years ago

希望支持中文通过

iansinnott commented 3 years ago

I've recently had success getting lunr working with Chinese manually using the approach described here: https://github.com/stkevintan/hugo-lunr-zh#usage

Even so, it would be great to have Chinese supported via this lib.

darkyzhou commented 3 years ago

Feeling sad that this PR is still not merged in 2021.

I forked the project and merged this PR into it, as well as publishing a new npm package called lunr-languages-zh for those who are in need of the support for Chinese.

Feel free to inform me if I shouldn't do this... :(

MrAwesome commented 3 years ago

@MihaiValentin can you merge this, or appoint someone to take over merging requests to the repo?

MihaiValentin commented 3 years ago

Thanks @repairearth for contributing with this!