elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.95k stars 24.74k forks source link

Create an API to manage synonyms #38523

Closed sdavids13 closed 1 year ago

sdavids13 commented 5 years ago

We would like the ability to manage synonyms via an API call instead of relying on putting a file down on all of the elasticsearch nodes or specifying in-line synonym in the analysis chain of a new index which isn't updatable. Ideally this would work similarly to Solr's "Managed Resource Filters" which allows the ability to modify synonyms and stopwords.

elasticmachine commented 5 years ago

Pinging @elastic/es-search

frutik commented 4 years ago

Would be great to have a list of stopwords in a separate index too

peterdm commented 4 years ago

This is especially important for Elastic Cloud which has no programmatic way to upload changes to synonym files (managed manually through console/plugin-management).

This limitation forces integration with upstream metadata management systems to happen directly in the analyzer chain (something Elastic has flagged as a bad practice since it increases the size of the cluster state.)

schonert commented 3 years ago

@elastic/es-search any news on this?

Henr1k80 commented 3 years ago

This is especially important for Elastic Cloud which has no programmatic way to upload changes to synonym files (managed manually through console/plugin-management).

As I understand it, plugins are only synced on start of the cluster, right? Meaning you have to restart the cluster to get the new files.

Henr1k80 commented 3 years ago

Could a solution be to make a plugin with a Synonym graph token filter that reads synonyms of the cluster settings? E.g.

        "filter": {
          "graph_synonyms": {
            "type": "synonym_graph_clustersettings",
            "updateable": "true",
            "lenient": "true",
            "synonyms_clustersettings": "dictionaries.english_synonyms"
          }
        }

and then put synonyms there:

PUT /_cluster/settings
{
  "persistent" : {
    "dictionaries.english_synonyms" : [
                "lol, laughing out loud",
                "universe, cosmos",
                "i-pod, i pod => ipod"
              ]
  }
}

The format could also just be a multi line string, if that is better. I know that this goes against the advice of having a small cluster state, but we do not need a complete generic synonyms dictionary, only ones we explicitly create. For our e-commerce use, we only need relatively few that matches that exact catalogs content, like 200 max. The plugin could come with all sorts of warnings, saying that you should not add a huge list and keep the cluster state small etc.

Henr1k80 commented 3 years ago

A better solution could be to store the data in a separate index E.g.:

        "filter": {
          "graph_synonyms": {
            "type": "synonym_graph_indexsource",
            "updateable": "true",
            "lenient": "true",
            "synonyms_indexsource": "dictionaries/english"
          }
        }

and then put synonyms there:

PUT /dictionaries/_doc/english
{
  "synonyms": [
                "lol, laughing out loud",
                "universe, cosmos",
                "i-pod, i pod => ipod"
              ]
}

that also support partial updates, making your API for maintaining synonyms much easier. The same document could also contain stopwords and keywords if needed.

Henr1k80 commented 3 years ago

I have made a plugin with a filter that works like I mentioned in the last comment. The index with the filter will fail to come up until the dictionary index is up and throws some exceptions in the logs, but good enough for our use™. Maybe a real elasticsearch developer can help delay the init of the index until dictionary index is up or at least make it less noisy in the logs. Would you like me to move the code from a plugin to inside elasticsearch and make a pull request? From 7.x branch?

cbuescher commented 3 years ago

I have made a plugin with a filter that works like I mentioned in the last comment. The index with the filter will fail to come up until the dictionary index is up and throws some exceptions in the logs, but good enough for our use™

Thanks for trying this out, and yes, the dependency on an external index is one of the many problems to think about when thinking about loading analyzer resources from other indices. Unfortunately, today it is no straight forward way to ensure a certain ordering of index intitializations, this is one ot the things a potential solution would have to solve. I'm glad it solves your usecase though.

Would you like me to move the code from a plugin to inside elasticsearch and make a pull request?

We always welcome pull requests, however I don't want to promise anything in terms of the time someone might need to take a look at this or if the choices you made for your plugin (e.g. storing all synonyms in one document as your example suggests) is the way we envision a longer-standing solution for this to work. Please keep that in mind before spending considerable amount of work converting what you have to a PR.

Henr1k80 commented 3 years ago

Thanks for trying this out, and yes, the dependency on an external index is one of the many problems to think about when thinking about loading analyzer resources from other indices.

Just to clarify, I guess it makes no difference if synonyms live a document in the index itself? Or is there some hooks that can reload the analyzer post index init?

We always welcome pull requests, however I don't want to promise anything in terms of the time someone might need to take a look at this or if the choices you made for your plugin (e.g. storing all synonyms in one document as your example suggests) is the way we envision a longer-standing solution for this to work. Please keep that in mind before spending considerable amount of work converting what you have to a PR.

Thanks for the honesty, I will skip the PR I was hoping the pull request would "just" be about naming, documentation, test coverage etc. 😅

nhsome commented 3 years ago

@Henr1k80, where can I see the source code of your plugin? Potentially this will be a good solution for me.

Henr1k80 commented 3 years ago

@nhsome I have it only locally for now. Last time I tried, I could only get it to work in a local instance of Elasticsearch, not Cloud. There was issues loading the plugin and issues changed from version to version, either simple security issues to breaking the deployment.. The plugin support in Elasticsearch Cloud seemed instable and broken at the time and I stopped working on it because there was not really any support to fix it from the elasticsearch support team. It was at the time of version 7.11 and previous.

Danouchka commented 2 years ago

Any news regarding this ER please ?

berkayalcin commented 2 years ago

Any Updates ?

GuillaumeSTEIN commented 2 years ago

This would be awesome :-)

sdavids13 commented 2 years ago

FYI here is what AWS OpenSearch does for inspiration: https://docs.aws.amazon.com/opensearch-service/latest/developerguide/custom-packages.html#custom-packages-updating

Henr1k80 commented 2 years ago

FYI here is what AWS OpenSearch does for inspiration: https://docs.aws.amazon.com/opensearch-service/latest/developerguide/custom-packages.html#custom-packages-updating

Last time I checked, any updates to the plugins required a restart of the deployment to sync the plugin files to the nodes. Not a viable solution

Danouchka commented 1 year ago

Any news please ?

elasticsearchmachine commented 1 year ago

Pinging @elastic/es-search (Team:Search)

SiwyDym commented 1 year ago

Do you have some news about API? :)

Henr1k80 commented 1 year ago

It is now in beta in 8.10 🥳 https://www.elastic.co/guide/en/elasticsearch/reference/8.10/synonyms-apis.html

mayya-sharipova commented 1 year ago

Indeed, it is implemented and tech preview is available in 8.10. Closing the issue