HakuyaLabs / warudo-docs

Warudo Handbook is offical ultimate guide for powerful 3D VTubing software Warudo!
https://docs.warudo.app/
3 stars 5 forks source link

Japanese search problems #26

Closed unsolublesugar closed 1 week ago

unsolublesugar commented 2 weeks ago

Summary

Can't search for Hiragana and katakana keywords in Japanese.

2024-07-09_13h05_47

If a keyword contains Kanji characters or English, it will appear in the search results for that keyword only.

2024-07-09_12h38_05

2024-07-09_12h38_48

Reference Information

cmfcmf/docusaurus-search-local , which is a fork of easyops-cn/docusaurus-search-local , has the following description in tokenizerSeparator, which may need to be addressed separately.

  // lunr.js-specific settings
  lunr: {
    // When indexing your documents, their content is split into "tokens".
    // Text entered into the search box is also tokenized.
    // This setting configures the separator used to determine where to split the text into tokens.
    // By default, it splits the text at whitespace and dashes.
    //
    // Note: Does not work for "ja" and "th" languages, since these use a different tokenizer.
    tokenizerSeparator: /[\s\-]+/,
    // https://lunrjs.com/guides/customising.html#similarity-tuning
    //
    // This parameter controls the importance given to the length of a document and its fields. This
    // value must be between 0 and 1, and by default it has a value of 0.75. Reducing this value
    // reduces the effect of different length documents on a term’s importance to that document.

https://github.com/cmfcmf/docusaurus-search-local?tab=readme-ov-file#usage

  tokenizerSeparator: /[\s\-\u{3000}-\u{301C}\u{3041}-\u{3093}\u{309B}-\u{309E}]+/gu

https://qiita.com/y_catch/items/46b7eb7d618d95fbc9c3

The currently employed easyops-cn plugin is missing the corresponding option. Without seeing the details in action, it's hard to know how to respond.

unsolublesugar commented 2 weeks ago

Probably the same problem as this one (unresolved)

https://github.com/easyops-cn/docusaurus-search-local/issues/338

TigerHix commented 2 weeks ago

Thanks for the report! @Nekotora Maybe we can take a look into https://www.meilisearch.com/ mentioned as an alternative in the other issue?

Nekotora commented 1 week ago

Thanks for the report too. 🫡 The current local search plugin seems lack support for languages other than Chinese and English. And with the document content continues to grow, the local index files are also becoming larger. (about 2MB per language now :x )

It might be time to consider finding a search backend. I will take a closer look at Meilisearch and other solutions, Sorry for the late reply, I've been a little busy recently. I will deal with that soon.

unsolublesugar commented 1 week ago

@Nekotora Thanks for taking the time to take a look! If you ever need to check the Japanese environment to make sure it works, please give me a shout. ;)

Nekotora commented 1 week ago

Sorry for taking such a long time to fix it.🙇 The multi-language search results should be very smooth now.

I initially tried to fix local search with lunr.js or set up a self-hosted crawler, but soon open-source free application at https://docsearch.algolia.com/ just approved, so we use Algolia for now.

If you have any questions, just let me know!

unsolublesugar commented 1 week ago

@Nekotora Thanks for fixing this problem! I've checked in my environment and everything is working fine! :laughing:

2024-07-19_10h19_29 2024-07-19_10h21_53 2024-07-19_10h21_30