OrchardCMS / OrchardCore

Orchard Core is an open-source modular and multi-tenant application framework built with ASP.NET Core, and a content management system (CMS) built on top of that framework.
https://orchardcore.net
BSD 3-Clause "New" or "Revised" License
7.22k stars 2.34k forks source link

Orchard corrupts Elastic stemmer configuration #16384

Open Lenar-Avia opened 5 days ago

Lenar-Avia commented 5 days ago

Describe the bug

Customer Elastic analyzers configuration from OrchardCore_Elasticsearch block gets corrupt

Orchard Core version

1.8.2

To Reproduce

Steps to reproduce the behavior:

  1. Go to 'OrchardCore_Elasticsearch' section in configuration
  2. Try to set up simple stemmer analyzer, SPANISH for example, as explained at: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html#spanish-analyzer
  3. Make sure you have both "analyzer" and "filter" sections inside "analysis" block
  4. Save and apply configuration
  5. Check out current GET .../_settings of elastic - to find out there is no "analysis.filter" area. i.e. actual settings now do not contain "filter" area (on the same level as "analyzer")
  6. Check out the termvectors to see that no handling happened to substrings.

Expected behavior

I would expect the termvectors to be created using stemmer. When directly setting config using PUT ../_settings , custom morphological analyzer can be applied.

Logs and screenshots

Please try to reproduce the following settings using OrchardCore_Elasticsearch:


"analysis": {
  "filter": {
    "spanish_stop": {
      "type":       "stop",
      "stopwords":  "_spanish_" 
    },
    "spanish_stemmer": {
      "type":       "stemmer",
      "language":   "light_spanish"
    }
  },
  "analyzer": {
    "default": {
      "tokenizer":  "standard",
      "filter": [
        "lowercase",
        "spanish_stop",
        "spanish_stemmer"
      ]
    }
  }
}
github-actions[bot] commented 5 days ago

Thank you for submitting your first issue, awesome! 🚀 We're thrilled to receive your input. If you haven't completed the template yet, please take a moment to do so. This ensures that we fully understand your feature request or bug report. A core team member will review your issue and get back to you.

If you like Orchard Core, please star our repo and join our community channels.

MikeAlhayek commented 4 days ago

@Lenar-Avia I am not sure I follow your steps. But in order to create rebuilt_spanish you referenced, your configuration should look like the following:

"OrchardCore_Elasticsearch": {
  // ...
  "Analyzers": {
    "rebuilt_spanish": {
      "tokenizer":  "standard",
      "filter": [
        "lowercase",
        "spanish_stop",
        "spanish_keywords",
        "spanish_stemmer"
      ]
    }
  }
}

Can you see if the above works for you? Here is a reference from our documentation

Lenar-Avia commented 1 day ago

Hello, dear! Well if i supply the request as you have provided, without the Analysis.Filters area, then exception happens when i try to rebuild the index. definitions: "index_not_found_exception" , "no such index [index_name]" Your request is incomplete without Filters.

Also you ignore the fact that only "default" analyzer is working in OrchardCMS config (i.e. it cannot be called rebuilt_spanish). Please try to get a working stemmer configuration before removing the bug tag..

MikeAlhayek commented 20 hours ago

Filters is not something we support in OC. Feel free to submit a PR that would add filters support in addition to the analyzers.

github-actions[bot] commented 20 hours ago

We triaged this issue and set the milestone according to the priority we think is appropriate (see the docs on how we triage and prioritize issues).

This indicates when the core team may start working on it. However, if you'd like to contribute, we'd warmly welcome you to do that anytime. See our guide on contributions here.