Open kderusso opened 3 months ago
Pinging @elastic/es-search (Team:Search)
Pinging @elastic/search-relevance (Team:Search - Relevance)
Pinging @elastic/ent-search-eng (Team:SearchOrg)
This is most probably because of the hard limit of 10,000 synonyms that we have on analysis when searching for synonyms.
We can set a bigger limit on this and also be explicit on the maximum number of synonyms we allow on updating.
We'll be updating the hard limit to 100,000 synonyms, and adding error handling on the API for checking the number of existing synonyms to avoid adding more than that.
Will do an initial fix for limiting the maximum number of synonyms to 10,000 from the API, and warn for synonyms sets that already contain more than that in this PR.
We will address raising the 10,000 limit in a separate PR.
Elasticsearch Version
8.13
Installed Plugins
No response
Java Version
bundled
OS Version
Cloud
Problem Description
This bug was initially reported by a community member via our discuss forums.
Creating large (>= 15,000 synonyms) synonym sets provides intermittent inconsistent results. The synonyms API will return successful results and no Elasticsearch errors are logged. The synonyms API will also return the individual synonyms correctly. However the
_analyze
call shows that certain synonyms are not returned.The actual synonyms that are not returned may change in different synonyms sets but if they return inconsistent results this behavior is permanent.
Updating the synonyms set, reloading analyzers and refreshing the index do not resolve this issue.
We should fix this so that all synonyms are analyzed correctly, and/or update our documentation with a max limit of the number of synonyms that are allowed in a synonyms set.
Steps to Reproduce
The following script was run in the Dev Console on an 8.13.3 cloud deployment. The value of 6000 works in this example (and any value above 6000 that I tested) but this may vary and you may need to try additional numbers if you reproduce.
NOTE: The create synonyms API is truncated to fit within size
synonyms_bug.txt
Logs (if relevant)
No response