Phrase Suggester only provides suggestions for confidence=0

Tobsucht commented 7 years ago

Elasticsearch version: [5.1.2]

JVM version: java version "1.8.0_112" Java(TM) SE Runtime Environment (build 1.8.0_112-b16) Java HotSpot(TM) 64-Bit Server VM (build 25.112-b16, mixed mode)

OS version: macOS 10.12.2

Description of the problem including expected versus actual behavior: I'm using the following query to produce suggestions:

{
  "suggestPhrase": {
    "text": "borer",
    "phrase": {
      "analyzer": "whitespace",
      "field": "content",
      "size": 5,
      "real_word_error_likelihood": 0.95,
      "confidence": 1,
      "separator": " ",
      "max_errors": 2,
      "force_unigrams": true,
      "token_limit": 10,
      "highlight": {
        "pre_tag": "<em>",
        "post_tag": "</em>"
      },
      "collate": {
        "query": {
          "inline": "{\"bool\" : {\"must\":{\"match_phrase\" : {\"{{field_content}}\" : {\"query\" : \"{{suggestion}}\", \"slop\" : \"8\"}} }, \"must\" : {\"match\":{\"{{field_shop}}\" : \"{{shopId}}\"}} } }",
          "lang": "mustache"
        },
        "params": {
          "field_shop": "shopId",
          "shopId": 2,
          "field_content": "content"
        },
        "prune": false
      }
    }
  }
}

The problem is, that I'm not getting any suggestions as long as the confidence is set to "1". If i reduce the confidence to "0", ES returns this:

"suggestPhrase": [
{
"text": "borer",
"offset": 0,
"length": 5,
"options": [
{
"text": "boden",
"highlighted": "<em>boden</em>",
"score": 0.023564594
}
,
{
"text": "bohrer",
"highlighted": "<em>bohrer</em>",
"score": 0.022763923
}
,
{
"text": "bohren",
"highlighted": "<em>bohren</em>",
"score": 0.018076658
}
,
{
"text": "bogen",
"highlighted": "<em>bogen</em>",
"score": 0.0049238233
}
,
{
"text": "barer",
"highlighted": "<em>barer</em>",
"score": 0.0025109707
}
]
}
]

But this is not intended, as the suggestions are partly of low quality. I've set up no special mapping for the "content" field. It is just a "text" field with whitespace analyzer.

I couldn't really find a documentation how the scores for the suggestions are calculated and whether they can be compared to present examples in the docs, but they seem to be pretty low. I don't know if it helps, but e.g. the second suggestion "bohrer" appears in about 2% of the indexed documents.

Is this a bug or a simple miss-configration? I could try to produce a small data sample which produces my result, if somebody can confim that this is not intended.

nik9000 commented 7 years ago

The phrase suggester isn't likely to work very well without the special mapping described. Getting it to return useful values is really a delicate thing. I'd remove all the options and slowly read them.

In any case, this seems to be working as expected. That "confidence": 1 says "find me terms that are more common on any shard than the query's terms". If you set the size to something large like 10000 then you should be able to see your term somewhere in the list. You can calculate the confidence by dividing all the scores by your term's score.

One point: I think this could use some more documentation so I'm going to leave this issue open. Maybe even we should add the score of the original term to the response for easier debugging.

Tobsucht commented 7 years ago

I'd remove all the options and slowly read them

I don't get it: what do you mean?

You can calculate the confidence by dividing all the scores by your term's score.

As you added below: the score of the inserted term is not returned.

If you set the size to something large like 10000 then you should be able to see your term somewhere in the list.

Increasing the size changes nothing. For a confidence of "0" I get the list I posted above and with "1" it is empty.

"find me terms that are more common on any shard than the query's terms"

Yes, but isn't that the case in my example?

nik9000 commented 7 years ago

I don't get it: what do you mean?

Errr - re-add them, is what I mean.

Increasing the size changes nothing. For a confidence of "0" I get the list I posted above and with "1" it is empty.

I see. I'd missed that part. This seems like an issue then. The trouble is reproducing it locally so I can debug it. Is there any chance you can distill some of the input data to make a complete reproduction?

Could you try doing this with the term suggester instead? That might shed some light on the issue.

Tobsucht commented 7 years ago

Ok, I've tried something different: I changed the mapping etc. to have the same set-up as in the documentation.

new Field:

"contentSuggest": {
"norms": false,
"analyzer": "trigram",
"type": "text"
}

My query looks like this:

{
  "suggestPhrase": {
    "text": "borer",
    "phrase": {
      "field": "contentSuggest",
      "size": 5,
      "gram_size": 3,
      "confidence": 1,
      "separator": " ",
      "highlight": {
        "pre_tag": "<em>",
        "post_tag": "</em>"
      },
      "collate": {
        "query": {
          "inline": "{\"bool\" : {\"must\":{\"match_phrase\" : {\"{{field_content}}\" : {\"query\" : \"{{suggestion}}\", \"slop\" : \"8\"}} }, \"must\" : {\"match\":{\"{{field_shop}}\" : \"{{shopId}}\"}} } }",
          "lang": "mustache"
        },
        "params": {
          "field_shop": "shopId",
          "shopId": 2,
          "field_content": "contentSuggest"
        },
        "prune": false
      }
    }
  }
}

The confidence is set to "1" => no result. The confidence is set to "0", I will geht this:

"suggestPhrase": [
{
"text": "borer",
"offset": 0,
"length": 5,
"options": [
{
"text": "boden",
"highlighted": "<em>boden</em>",
"score": 0.017295634
}
,
{
"text": "bohrer",
"highlighted": "<em>bohrer</em>",
"score": 0.01263889
}
,
{
"text": "bohren",
"highlighted": "<em>bohren</em>",
"score": 0.0103149135
}
,
{
"text": "bogen",
"highlighted": "<em>bogen</em>",
"score": 0.0031847006
}
,
{
"text": "barer",
"highlighted": "<em>barer</em>",
"score": 0.0016731215
}
]
}

Just for clarification: There is no suggestion missing. I'm looking for "bohrer" which is returned for confidence=0.

What I noticed is the following: When I create a fresh index and start to index data, at the beginning everything is working with confidence="1". And at some point (which seems not really deterministic), I have to adjust the confidence value to "0" to get any result.

I don't get it, in my opinion the set up is the same as in the documentation. Is there still a weird configuration? I will try to distill a dataset that reproduces this behaviour, but this seems to be difficult.

nik9000 commented 7 years ago

I'm going to see if I can reproduce this locally later this morning. I have some documentation to finish rewriting first but I'll see what I can see.

nik9000 commented 7 years ago

OK. I tracked this down to something to do with collate. Drop collate and you should get normal results. I'll figure out what is up with collate now though.

nik9000 commented 7 years ago

Ok so here is what is going on. prune doesn't mean what it looks like it should mean. prune: false means "remove suggestions that don't match the query". prune: true means "return suggestions that don't match the query but set collate_match: false. I think that is backwards.

Anyway, if you set prune: true then you should see "borer" in the list suggestions with a score above the "boden". At least, you should if there isn't anything else weird going on.

nik9000 commented 7 years ago

I filed #23983 so we could talk about what to do about prune....

Tobsucht commented 7 years ago

Unfortunately it's not really helping:

{
  "suggestPhrase" : {
    "text" : "borer",
    "phrase" : {
      "analyzer" : "trigram",
      "field" : "contentSuggest",
      "size" : 5,
      "real_word_error_likelihood" : 0.95,
      "confidence" : 1.0,
      "separator" : " ",
      "max_errors" : 2.0,
      "force_unigrams" : true,
      "token_limit" : 10,
      "highlight" : {
        "pre_tag" : "<em>",
        "post_tag" : "</em>"
      }
    }
  }
}

returns nothing for confidence="0". For "1" the result is:

"suggestPhrase": [
{
"text": "borer",
"offset": 0,
"length": 5,
"options": [
{
"text": "borer",
"highlighted": "borer",
"score": 0.08450177
}
,
{
"text": "boden",
"highlighted": "<em>boden</em>",
"score": 0.015580622
}
,
{
"text": "bohrer",
"highlighted": "<em>bohrer</em>",
"score": 0.013646716
}
,
{
"text": "bohren",
"highlighted": "<em>bohren</em>",
"score": 0.010613766
}
,
{
"text": "bogen",
"highlighted": "<em>bogen</em>",
"score": 0.0031379266
}
]
}
]

But it helped in some way - what I'm noticing is the following: the score of "borer" is constantly decreasing while indexing data. And at some point it is making a pretty big jump from something like 0.0002 to 0.06 which explains the needed confidence value. How is the score for the input value calculated and why is it making such a big jump? The input word "borer", is NOT present in the field.

nik9000 commented 7 years ago

To figure that out I'd need your data set.

On Mon, Apr 10, 2017, 2:45 AM Tobsucht notifications@github.com wrote:

Unfortunately it's not really helping:

{ "suggestPhrase" : { "text" : "borer", "phrase" : { "analyzer" : "trigram", "field" : "contentSuggest", "size" : 5, "real_word_error_likelihood" : 0.95, "confidence" : 1.0, "separator" : " ", "max_errors" : 2.0, "force_unigrams" : true, "token_limit" : 10, "highlight" : { "pre_tag" : "", "post_tag" : "" } } } }

returns nothing for confidence="0". For "1" the result is:

"suggestPhrase": [ { "text": "borer", "offset": 0, "length": 5, "options": [ { "text": "borer", "highlighted": "borer", "score": 0.08450177 } , { "text": "boden", "highlighted": "boden", "score": 0.015580622 } , { "text": "bohrer", "highlighted": "bohrer", "score": 0.013646716 } , { "text": "bohren", "highlighted": "bohren", "score": 0.010613766 } , { "text": "bogen", "highlighted": "bogen", "score": 0.0031379266 } ] } ]

But it helped in some way - what I'm noticing is the following: the score of "borer" is constantly decreasing while indexing data. And at some point it is making a pretty big jump from something like 0.0002 to 0.06 which explains the needed confidence value. How is the score for the input value calculated and why is it making such a big jump? The input word "borer", is NOT present in the field.

— You are receiving this because you were assigned.

Reply to this email directly, view it on GitHub https://github.com/elastic/elasticsearch/issues/23838#issuecomment-292863293, or mute the thread https://github.com/notifications/unsubscribe-auth/AANLoopK72zMURH8jH3YirGZMlaJcyCLks5rudAcgaJpZM4MvRpe .

Tobsucht commented 7 years ago

So, I had some time to build a "small" dataset. The dataset is anonymised but nevertheless I don't want to put it public. Can I send you an email with the JSON dump?

nik9000 commented 7 years ago

So, I had some time to build a "small" dataset. The dataset is anonymised but nevertheless I don't want to put it public. Can I send you an email with the JSON dump?

Sure. Send it to the email address in my github profile.

Tobsucht commented 7 years ago

So for anyone having the same problem: I have multiple shards which causes trouble in the score calculation as @nik9000 explained to me (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-search-type.html#dfs-query-then-fetch).

Additionaly using DFS_QUERY_THEN_FETCH seems not to work for suggesters.

Tobsucht commented 7 years ago

Is this something that could be added to the suggesters in the near future? Because at the moment the phrase suggester is not useable for me and I guess also for other people with multiple shards.

nik9000 commented 7 years ago

I don't have plans to do any work on suggesters anytime soon, though I'll keep this on my list as a "one day" thing.

Tobulus commented 7 years ago

I just navigated a little bit through the code:

From TransportSearchAction.java: https://github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/action/search/TransportSearchAction.java#L288

if (searchRequest.isSuggestOnly()) {
            // disable request cache if we have only suggest
            searchRequest.requestCache(false);
            switch (searchRequest.searchType()) {
                case DFS_QUERY_THEN_FETCH:
                    // convert to Q_T_F if we have only suggest
                    searchRequest.searchType(QUERY_THEN_FETCH);
                    break;
            }
        }

Naive question: Why gets the searchtype converted? (the relavant logic should be implemented in SearchDfsQueryThenFetchAsyncAction.java etc.)

Tobsucht commented 7 years ago

I just found out, that I had a really miserable bug in my code: my custom routing placed all data in one shard which ruined the scoring ...

andyb-elastic commented 6 years ago

Thanks for following up @Tobsucht. To clarify, you found the scoring issue you were seeing was caused by custom routing throwing off shard-level term statistics? If you changed your routing to more uniformly distribute your data, did that help?

@nik9000 to be clear is the proposed enhancement mentioned in the last couple comments to enable suggesters to use distributed frequency statistics?

@elastic/es-search-aggs

Tobsucht commented 6 years ago

If you changed your routing to more uniformly distribute your data, did that help?

Yes, fixed the problem.

elasticsearchmachine commented 3 months ago

Pinging @elastic/es-search-relevance (Team:Search Relevance)

elastic / elasticsearch

Phrase Suggester only provides suggestions for confidence=0 #23838