elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.61k stars 24.63k forks source link

Make it possible to see if a term matches text in the index or does not. #11579

Closed ilanrivers closed 6 years ago

ilanrivers commented 9 years ago

When suggestions are found you see these in the options of the results But when a term is spelled correctly you also see no options filled in.

So there is no way to really know if the suggester was not able to find a matching suggestion or if the term was spelled correctly to begin with.

clintongormley commented 9 years ago

I'm afraid I don't understand what you're asking. You want to expand on your question, perhaps with JSON examples?

ilanrivers commented 9 years ago

I will try to give a clear example: Lets say you fill an index with 3 words:

POST /test_es_suggest/type
{
  "word": "Software"  
}

POST /test_es_suggest/type
{
  "word": "Developer"  
}

POST /test_es_suggest/type
{
  "word": "Programmer"  
}

Now i use the Suggest API as follows:

GET /test_es_suggest/_suggest
{
  "Suggest": {
      "term": {
        "field": "word",
        "suggest_mode": "popular",
        "size": 2,
        "prefix_len": 1,
        "analyzer": "default"
      },
      "text": "Softwarer"
    }
}

I receive the following results which is fine because software was spelled bad now it is corrected The options is filled with the new word:

"Suggest": [
      {
         "text": "softwarer",
         "offset": 0,
         "length": 9,
         "options": [
            {
               "text": "software",
               "score": 0.875,
               "freq": 1
            }
         ]
      }
   ]

Now if i search for the correct word:

GET /test_es_suggest/_suggest
{
  "Suggest": {
      "term": {
        "field": "word",
        "suggest_mode": "popular",
        "size": 2,
        "prefix_len": 1,
        "analyzer": "default"
      },
      "text": "Software"
    }
}

The results is as follows:

"Suggest": [
      {
         "text": "software",
         "offset": 0,
         "length": 8,
         "options": []
      }
   ]

Options is empty so I assume there is nothing to be suggested OR was the word just not found?

If i search for "softwareasdfasdf" i get the exact same response which i know i get that response because there is no suitable word to be suggested.

"Suggest": [
      {
         "text": "softwareasdfasdf",
         "offset": 0,
         "length": 16,
         "options": []
      }
   ]

In the example with "Software" I would like to know that the word was correct so there is nothing to be suggested. In my opinion there is a clear difference in the reason both responses show an empty "options" array. One word cannot be found and the other word is already correct.

So my question is, can something be added to the response so I know the difference?

clintongormley commented 9 years ago

What about just setting the suggest_mode to always?

See https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-term.html

ilanrivers commented 9 years ago

According to the documentation: Suggest mode Always : Suggest any matching suggestions based on terms in the suggest text.

But if I use suggest mode always I still do not know if the word was found or not and I still receive an empty options array by just changing the mode to always so it seems this does not guarantee that you receive a suggestion.

GET /test_es_suggest/_suggest
{
  "Suggest": {
      "term": {
        "field": "word",
        "suggest_mode": "always",
        "size": 2,
        "prefix_len": 1,
        "analyzer": "default"
      },
      "text": "softwaredev"
    }
}
ilanrivers commented 9 years ago

Was this question clear now? @clintongormley

ilanrivers commented 9 years ago

Is this issue going to be done? It seems as if a field can be added to indicate if the search term is an exact match with an word in the index and that solves the problem.

DaanBiesterbos commented 8 years ago

Indeed. It would be very useful to know why no suggestion was returned exactly. :+1:

simpliste commented 8 years ago

It would be very helpful for me too, to see the difference between a term that is correctly spelled and a term where is no suggestion for!

nkelly75 commented 7 years ago

Might be simplifying here but my experience is that the term suggester and phrase suggester can be valuable with little custom effort. They are good at suggesting corrections so if the data has 'virtualize' and I type 'virtualise' I'll get a useful suggestion. The suggest_mode 'always' doesn't seem to work as I'd expect so when a user types a term that is good the term suggester can't offer feedback.

In the process of working around this by extracting interesting terms from main index and generating a special index for use with a completion suggester. This is going pretty well but seems strange that the term/phrase suggesters can do the harder job of helping suggest corrections but can't provide results that say yep 'virtuali' is a term that can be completed with 'virtualize'.

jimczi commented 6 years ago

The term and phrase suggester are not returning corrections based on the absence of a term, it uses the statistics of the index to choose whether a term should be corrected or not. The fact that a term appears once in a single shard is not an indication that the term is correctly spelled. There are ways to improve the precision of these suggesters, for instance the real_word_error_likelihood can be set to 1 in order to indicate that all terms in the dictionary are correctly spelled. I'd not advise to do that if you're indexing random pages on the web but this can be a solution if you don't want to return suggestions for terms that already appear in the dictionary. I am going to close this issue because the goal of these suggesters is not to determine if a term (or phrase) is present in the index but rather try to find a better query that would statistically be more relevant for the targeted corpus.