emersonbottero / vitepress-plugin-search

Provide local search to your documentation site.
MIT License
238 stars 30 forks source link

How can I apply the word split function to the CJK text in search box? #65

Open arcqiufeng opened 1 year ago

arcqiufeng commented 1 year ago

We don't split words by space in CJK Language. There is no simple way to split a CJK sentence into words.

I found a module "segment" can do this job. Now it is works partly: See https://github.com/emersonbottero/vitepress-plugin-search/issues/64

If I type the keyword: 设计责任 in search box, found nothing.

If I type the keyword: 设计 责任 in search box, found the result.

I think it is because of 设计责任 is not a single word but two words. It should be split into two words then passed to search box.

Can I split it into two words automatically (using the module segment)?

emersonbottero commented 1 year ago

https://github.com/emersonbottero/vitepress-plugin-search/issues/11

arcqiufeng commented 1 year ago

11

No. That's not my problem.

It is no problem to use this word splitter for the text and then generate the index. This problem has been solved according to the link you said. I also participated in the discussion in that link. This word splitter works well for text segmentation. It has basically been able to meet the demand.

Now my problem is that I want to apply this word splitter in the search box. Only when word segmentation is also carried out in the search box, can we get more correct results. If I need to get the right search results, I have to manually break the words in the search box.

emersonbottero commented 1 year ago

Gotcha. I'll have to expose a callback called on text input. I think it is possible.