easyops-cn / docusaurus-search-local

Offline/local search for Docusaurus v2/v3
https://easyops-cn.github.io/docusaurus-search-local/
MIT License
712 stars 90 forks source link

removeDefaultStemmer option does not seems to work #396

Open tmax22 opened 9 months ago

tmax22 commented 9 months ago
// docusaurus.config.ts
  themes: [
    [
      require.resolve("@easyops-cn/docusaurus-search-local"),
      /** @type {import("@easyops-cn/docusaurus-search-local").PluginOptions} */
      {

        hashed: true,
        highlightSearchTermsOnTargetPage: true,
        removeDefaultStemmer: true,
        removeDefaultStopWordFilter: true,
      },
    ],
  ],

i enabled removeDefaultStemmer but i still cannot search for partiel word terms (for example search for utomatio and not automation

msykes commented 5 months ago

I'm having the same issue. Partial searches aren't working for me.

I've tried rebuilding, emptying caches on browser and reloading, no difference. hashed false/true doesn't make a difference.

weareoutman commented 5 months ago

Currently this is not supported. Under the hood, only trailing wildcard is used, which means keywords like autom will match automation, while utomatio will not.

msykes commented 5 months ago

thanks for the reply, for me the behaviour is the same if it's set to true or false then, for example, if I type auto then automation would be found regardless of that setting... is that normal?

At the same time, something like #_T won't find #_TEST, that was what I was hoping to achieve with this setting, thinking maybe e search starting with #_ is the problem and TESTwould partially match, any suggestions/feedback there is appreciated!

weareoutman commented 5 months ago

What a Stemmer do is that:

reducing inflected (or sometimes derived) words to their word stem, base or root

Such as converting automation / automate to the same root autom.

So enable removeDefaultStemmer will disable that conversion.

But whether enable removeDefaultStemmer or not will not affect the result of typing auto to match automation, because in both config it matches the trailing wildcard pattern.

For your question about #_TEST, this word will be stored as a token as it is, which is #_TEST, because by default the English tokenizer only splits on whitespace and hyphens. And currently we have no options for that customization.