Completion Suggester for doc with multiple matching inputs only returns one suggestion

steve-e commented 6 years ago

Elasticsearch version 6.3.0 (Also tested on 6.2.4)

Plugins installed: []

JVM version (java -version): OpenJdk 1.8.0_161 (also tested Oracle jvm 1.8.0_171)

OS version (uname -a if on a Unix-like system): Linux 4.4.115-k8s (FYI also tested on unsupported windows 10)

Description of the problem including expected versus actual behavior: Completion Suggester A document can have multiple "input" for the completion suggester. A search can made with a prefix that matches more than one suggestion.

EXPECTED: the suggestion array ("song-suggest" in the example) should return the same "_source" document multiple times, each with different options.text.

Expected response { "took" : 25, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 0, "max_score" : 0.0, "hits" : [ ] }, "suggest" : { "song-suggest" : [ { "text" : "n", "offset" : 0, "length" : 1, "options" : [ { "text" : "Nevermind", "_index" : "music", "_type" : "_doc", "_id" : "1", "_score" : 34.0, "_source" : { "suggest" : { "input" : [ "Nevermind", "Nirvana" ], "weight" : 34 } } }, { "text" : "n", "offset" : 0, "length" : 1, "options" : [ { "text" : "Nirvana", "_index" : "music", "_type" : "_doc", "_id" : "1", "_score" : 34.0, "_source" : { "suggest" : { "input" : [ "Nevermind", "Nirvana" ], "weight" : 34 } } } ] } ] } }

ACTUAL: the suggestion array only contains one suggestion per indexed document even if more that one input matches

Actual response received { "took" : 25, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 0, "max_score" : 0.0, "hits" : [ ] }, "suggest" : { "song-suggest" : [ { "text" : "n", "offset" : 0, "length" : 1, "options" : [ { "text" : "Nevermind", "_index" : "music", "_type" : "_doc", "_id" : "1", "_score" : 34.0, "_source" : { "suggest" : { "input" : [ "Nevermind", "Nirvana" ], "weight" : 34 } } } ] } ] } }

Steps to reproduce:

Based on the example here https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-completion.html

Create mappings curl -X PUT "localhost:9200/music" -H 'Content-Type: application/json' -d' { "mappings": { "_doc" : { "properties" : { "suggest" : { "type" : "completion" }, "title" : { "type": "keyword" } } } } } '
Index document curl -X PUT "localhost:9200/music/_doc/1?refresh" -H 'Content-Type: application/json' -d' { "suggest" : { "input": [ "Nevermind", "Nirvana" ], "weight" : 34 } } '
Search with prefix 'N' curl -X POST "localhost:9200/music/_search?pretty" -H 'Content-Type: application/json' -d' { "suggest": { "song-suggest" : { "prefix" : "n", "completion" : { "field" : "suggest" } } } } '

javanna commented 6 years ago

hi @steve-e I don't follow why we would return the same document twice. Do our docs that you linked suggest that? I would expect this behaviour, maybe it would be nice to know that two inputs have matched rather than only one of them, not too sure about this.

elasticmachine commented 6 years ago

Pinging @elastic/es-search-aggs

steve-e commented 6 years ago

Hi @javanna, I would expect the completion suggester to return (potentially) all the matching completions, to help me choose my search term. At present some completions are filtered out.

The example to reproduce I gave was very simple and only had a common prefix of length 1. But if I had the following inputs: Star Wars Episode I – The Phantom Menace Star Wars Episode IV – A New Hope Star Wars Episode V – The Empire Strikes Back Star Wars Episode VI – Return of the Jedi ... etc then I typing "Star Wars" would only suggest "Star Wars Episode I – The Phantom Menace" and I would have to type "Star Wars Episode IV" before "Star Wars Episode IV – A New Hope" appeared as a suggestion.

This api is mainly concerned with returning suggested options.text. If we don't want to return the same document twice, in the _source of 2 different options, then the options.text could be an array of all matching completions for this document. But that would require a change for clients.

javanna commented 6 years ago

@steve-e thanks for promptly replying! Maybe I am getting confused, but I would expect those different inputs to be on different documents, one per document. Yet I will leave this open for discussion.

ptitpix commented 6 years ago

Hi !

It took me sometimes to find out my problem which seems to be the same as @steve-e. I have a document which have keywords ( ex : ["caucasian", "canadian" ] ) and I'd like to suggest the two keywords when I type "ca". Right now, the only way to do it is to normalize keywords in another index and to search suggestions against this new index. It would be more convenient to suggest the same document multiple times with different texts.

jimczi commented 6 years ago

The completion suggester is document-based by design so we cannot return one entry per matching suggestion. It is documented that it returns documents not suggestions and a single input can be indexed in multiple suggestions (if you have synonyms in your analyzer for instance) so it is not trivial to differentiate a match from its variations. Also the completion suggester does not visit all suggestions to select the top N, it has a special structure (a weighted FST) that can visit suggestions in the order of their scores and early terminates the query once enough documents have been found. We've discussed this in our internal meeting and we are not going to change the design of this suggester which returns documents and not suggestions. Though we agreed that it should be possible to highlight the inputs of of the field using the completion query like any other field would do with the main query. It is not a low hanging fruit though but it would be consistent with the design and the intent of this suggester which (again ;) ) shouldn't be used to suggest terms or phrases (we have the term/phrase suggester for this use case).

martinblaustein commented 6 years ago

@steve-e @ptitpix did you find a solution for this?

steve-e commented 6 years ago

@steve-e @ptitpix did you find a solution for this?

I have put each completion suggestion into its own document, without the rest of the source document. This works around the problem where only one completion might be considered.

For my specific usecase this should be in an entirely different index, so that the completion is unrelated to the specific original document from which the completion is taken.

martinblaustein commented 6 years ago

@steve-e Thanks!

Michiel-de-Wolde commented 6 years ago

We use the completion query for search-as-you-type. We use ES version v6.1.3.

The document based approach rather than term based seems OK to me, especially when filtering documents based on security.

However certain completion terms are simply missing from the result set. We applied two hacks to compensate for what I consider a flaw.

The result set is a set of documents and each document is the carrier for one completion term, despite that for a particular document multiple other completion terms may also apply.

We note that for each document the first applicable term is selected and associated with the document. Perhaps the first term is selected because all term weights are the same in our case. If the first term is selected the remaining applicable terms are lost for completion, unless these terms are obtained from other documents. With skip_duplicates a next term may be selected if the first term is already seen in the document set built-up so far, but still terms will be missed.

A first hack is to randomize the order of the terms at indexing time (formerly they were sorted, for readability). Randomizing increases the probability that the first term of each found document is a term not seen before.

A second measure is to request the document _source field, limited to the field used to store completion terms. Per document get the set of completion terms. Reduce to the set that matches the term prefix typed so far. Build-up the overal set of unique completion terms.

This approach helps us to establish a larger set of completion terms, that is however not guaranteed to be exhaustive. Consider a document for which two completion terms apply that are not present in any other document. The particular document can carry one term only and the other term is lost for completion.

A clever - document based - solution is still desired.

christopheblin commented 5 years ago

@jimczi You said "we have the term/phrase suggester for this use case", could you elaborate a little more please ?

For example,

curl -X POST "localhost:9200/music/_search?pretty" -H 'Content-Type: application/json' -d' { "suggest": { "song-suggest" : { "prefix" : "n", "term" : { "field" : "suggest" } } } }

Returns 0 results in my case (and I expect it would return both nirvana and nevermind

mayya-sharipova commented 5 years ago

@christopheblin Term suggester finds suggestions that are within the edit distance from the input text. The edit distance can be 1 or 2. You should supply text parameter that is close enough to expected suggestions.

{
    "suggest": {
        "song-suggest": {
            "text": "Nirva",
            "term": {
                "field": "title"
            }
        }
    }
}

elasticsearchmachine commented 4 months ago

Pinging @elastic/es-search-relevance (Team:Search Relevance)

elastic / elasticsearch

Completion Suggester for doc with multiple matching inputs only returns one suggestion #31738