elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.16k stars 24.84k forks source link

context suggester not working in elasticsearch #6444

Closed bigerock closed 9 years ago

bigerock commented 10 years ago

running the example given in the documentation is throwing an error:

{ "_shards": { "total": 5, "successful": 4, "failed": 1, "failures": [ { "index": "services", "shard": 2, "reason": "BroadcastShardOperationFailedException[[services][2] ]; nested: ElasticsearchException[failed to execute suggest]; nested: NullPointerException; " } ] } }

here's the code example given:

PUT /services

PUT /services/service/_mapping { "service": { "properties": { "name": { "type" : "string" }, "tag": { "type" : "string" }, "suggest_field": { "type": "completion", "context": { "color": { "type": "category", "path": "color_field", "default": ["red", "green", "blue"] }, "location": { "type": "geo", "precision": "5m", "neighbors": true, "default": "u33" } } } } } }

PUT services/service/1 { "name": "knapsack", "suggest_field": { "input": ["knacksack", "backpack", "daypack"], "context": { "color": ["red", "yellow"] } } }

POST services/_suggest?pretty' { "suggest" : { "text" : "m", "completion" : { "field" : "suggest_field", "size": 10, "context": { "color": "red" } } } }

clintongormley commented 10 years ago

This code https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/search/suggest/context/ContextMapping.java#L300 expects to receive a query for each context, but doesn't have a default value if a context is missing, or doesn't throw an error if a context is missing.

clintongormley commented 10 years ago

I assume that contexts are required? No way to handle the "all" case for a particular context?

bigerock commented 10 years ago

are you saying it needs a query? i'm not following.

spinscale commented 10 years ago

This looks like a bug indeed, as the mapping allows you to specify defaults, you should be able to so specify something like "location": {} in your query to make sure it uses the default location in the mapping. Alternatively an empty context in a query should fallback to the defaults and not raise NPEs

POST services/_suggest?pretty'
{
"suggest" : {
"text" : "m",
"completion" : {
"field" : "suggest_field",
"size": 10,
"context": {
}
}
}
}
benjamin-smith commented 10 years ago

I had a very similar issue, and passing in an empty object for any context items that I did not want to specify got rid of the NPE error.

shoteff commented 10 years ago

I also think this should be considered a bug. According to the ES documentation for the category query: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/suggester-context.html#_category_query

A query within a category works similar to the configuration. If the value is null the mappings default categories will be used. Otherwise the suggestion takes place for all documents that have at least one category in common with the query.

It works if the context value is explicitly set to null, but it doesn't work if the context value is not provided at all (even if there is default value in the mapping)

This works:

{
  "suggest": {
    "text": "a",
    "completion": {
      "field": "my_autocomplete",
      "size": 10,
      "context": {
        "_type" : "tweet",
        "additional_name": null
      }
    }
  }
}

And this doesn't:

{
  "suggest": {
    "text": "a",
    "completion": {
      "field": "my_autocomplete",
      "size": 10,
      "context": {
        "_type" : "tweet"
      }
    }
  }
}
missinglink commented 9 years ago

FWIW I agree that this functionality should be changed / fixed.

background

ES allows you to define completion suggesters, defined by "type": "completion" in your mapping, which contain context suggesters which can be either 'category' or 'geo'. These provide a means of reducing the documents returned by the suggester for the given input text.

You can optionally provide a default value for each context suggester, which is only used; at index time; when the document you send to be indexed is missing a value for that context.

the problem

All of the defined contexts must be specified when sending a query to /_suggest. Failing to do so will result in the cryptic error ElasticsearchException[failed to execute suggest]; nested: NullPointerException;. ..as per @spinscale comment

Some people above commented about the default value, which has nothing to do with the query, don't confuse this with omitting the context in the query which is what I believe everyone really wants.

@shoteff if you read that passage again it refers to index time, not to query time.

use-cases

I'll give you my use-case to give some context about how I would like to use it.

We are indexing 50M places of interest around the globe, and for each I would like to have 1x 'category' context for its 'class' (think restaurant/bar/cafe) in addition to the 1x 'geo' context used to target geo-relevant documents.

Who doesn't want autocomplete for all the bars near their house? am I right?

Problem is.. once I introduce the 'class' context I can no longer query for 'all' documents in the UK. I am now forced to specify a 'class' on every single request, or worse, send a list of every class with every query. Sending 'key:null' as per previous comments does not remedy this issue.

FWIW: The problem defined above also applies to using the 'geo' context on it's own and not specifying a lat/lon at query time, which means you can never query at precision:0, and can never have autocomplete for the whole globe.

test-case / game

I put together a testcase for you to explain what's going on, it's the game 'Guess Who?' which you should all remember from your youth. I hope you enjoy @clintongormley, @markharwood ;)

#!/bin/bash

################################################
# Let's play Guess Who? in Elasticsearch!!
################################################

ES='localhost:9200';

# drop index
curl -XDELETE "$ES/guesswho?pretty=true" 2>/dev/null;

# create index
curl -XPUT "$ES/guesswho?pretty=true" -d'
{ 
  "settings": {
    "index": {
      "number_of_shards": 1,
      "number_of_replicas": 0
    }
  },
  "mappings": {
    "character": {
      "properties": {
        "name": { "type": "string" },
        "sex": { "type": "string" },
        "hair": { "type": "string" },
        "eyes": { "type": "string" },
        "wears": { "type": "string" },
        "guess": {
          "type": "completion",
          "payloads": "false",
          "preserve_separators": "false",
          "preserve_position_increments": "false",
          "max_input_length": "50",
          "context": {
            "sex": {
              "type": "category",
              "path": "sex",
              "default": "unknown"
            },
            "hair": {
              "type": "category",
              "path": "hair",
              "default": "unknown"
            },
            "eyes": {
              "type": "category",
              "path": "eyes",
              "default": "unknown"
            },
            "wears": {
              "type": "category",
              "path": "wears",
              "default": "unknown"
            }
          }
        }
      }
    }
  }
}';

# Index some characters
index_character() {
  curl -XPOST "$ES/guesswho/character/$1?pretty=true" -d'
  {
    "name":"'"$1"'",
    "sex":"'"$2"'",
    "hair":"'"$3"'",
    "eyes":"'"$4"'",
    "wears":"'"$5"'",
    "guess": {
      "input": [ "'"$1"'" ],
      "output": "'"$1 is $2 with $3 hair, $4 eyes and wears $5"'"
    }
  }';
}

# src: http://www.buzzfeed.com/kenblom/the-game-of-guess-who-and-their-celebrity-dopplgangers
index_character "bernard" "male" "red" "red" "a fedora";
index_character "anne" "female" "black" "red" "earrings";
index_character "eric" "male" "blond" "yellow" "a sailor hat";
index_character "george" "male" "grey" "red" "a detective hat";
index_character "herman" "male" "blond" "red" "nothing";
index_character "alfred" "male" "blond" "blue" "frown";
index_character "mystery" "unknown" "unknown" "unknown" "unknown";

# Refresh index
curl -XPOST "$ES/guesswho/_refresh";

# The fun is trying to guess when you play Guess Who?
guess_character_without_context() {
  curl -XPOST "$ES/guesswho/_suggest?pretty=true" -d'
  {
    "guesswho": {
      "text":"'"$1"'",
      "completion": {
        "size": "10",
        "field": "guess"
      }
    }
  }';
}

guess_character_by_sex() {
  curl -XPOST "$ES/guesswho/_suggest?pretty=true" -d'
  {
    "guesswho": {
      "text":"'"$1"'",
      "completion": {
        "size": "10",
        "field": "guess",
        "context": {
          "sex":"'"$2"'"
        }
      }
    }
  }';
}

guess_character_by_specifying_everything() {
  curl -XPOST "$ES/guesswho/_suggest?pretty=true" -d'
  {
    "guesswho": {
      "text":"'"$1"'",
      "completion": {
        "size": "10",
        "field": "guess",
        "context": {
          "sex":"'"$2"'",
          "hair":"'"$3"'",
          "eyes":"'"$4"'",
          "wears":"'"$5"'"
        }
      }
    }
  }';
}

guess_character_without_context "a";
# ElasticsearchIllegalArgumentException[suggester [completion] requires context to be setup]

guess_character_by_sex "a" "male";
# ElasticsearchException[failed to execute suggest]; nested: NullPointerException;

guess_character_by_specifying_everything "a" "male" "blond" "blue" "frown";
# I win, play again?
shoteff commented 9 years ago

@missinglink thank you, I think I messed up the index and query time stuff, but fortunately you got the idea :)

missinglink commented 9 years ago

cc/ @chilling @areek

sschuerz commented 9 years ago

So what is the value that I have to set if I don't want restrict the context for a particular field in "context"? Using null only gives me the documents that really have null for this field. Using an empty object results in a NullPointerException. Does anyone have a solution for this problem? I use elasticsearch 1.4.

missinglink commented 9 years ago

Hey @sschuerz there are currently no 'solutions' for this issue, it will require a change to the codebase to fix.

Two absolute hacks come to mind; which I feel dirty for mentioning, but I will anyway.

  1. add '*' (or any other token) to every 'category' context for every record and then use that as your catchall.
  2. somehow collect a list of all possible values of a context and send the entire list with every request to /_suggest.

They are both awful hacks but should work, I am currently considering whether to consider consideration of them myself.

If you are using the 'geo' context then you're unfortunately out of luck, I don't think there is a hack to allow you to query the whole planet.

shoteff commented 9 years ago

Actually I'm using the 1-st scenario, using a script I'm adding '*' to the list of the categories, so I can always have it as catchall

missinglink commented 9 years ago

This is not an acceptable solution to this issue, but I appreciate that we both came up with the same hacky workaround :)

sschuerz commented 9 years ago

Thank you for the responses, @missinglink and @shoteff. I think I'll use the first scenario because collecting a list is very unhandy in my context. @shoteff: Could you please provide an example for such a script?

missinglink commented 9 years ago

I think what @shoteff is referring to is below, which I would like to re-iterate is a hack for the following 3 reasons:

index_character() {
  curl -XPOST "$ES/guesswho/character/$1?pretty=true" -d'
  {
    "name":["*","'"$1"'"],
    "sex":["*","'"$2"'"],
    "hair":["*","'"$3"'"],
    "eyes":["*","'"$4"'"],
    "wears":["*","'"$5"'"],
    "guess": {
      "input": [ "'"$1"'" ],
      "output": "'"$1 is $2 with $3 hair, $4 eyes and wears $5"'"
    }
  }';
}
guess_character_by_hack() {
  curl -XPOST "$ES/guesswho/_suggest?pretty=true" -d'
  {
    "guesswho": {
      "text":"'"$1"'",
      "completion": {
        "size": "10",
        "field": "guess",
        "context": {
          "sex":"'"$2"'",
          "hair":"*",
          "eyes":"*",
          "wears":"*"
        }
      }
    }
  }';
}
missinglink commented 9 years ago

My java is terrible, the java.lang.NullPointerException seems to be caused here: https://github.com/elasticsearch/elasticsearch/blob/8aebb9656b63148190076bd8cdc5be0e87a9923d/src/main/java/org/elasticsearch/search/suggest/context/ContextMapping.java#L286

more specifically.. https://github.com/elasticsearch/elasticsearch/blob/8aebb9656b63148190076bd8cdc5be0e87a9923d/src/main/java/org/elasticsearch/search/suggest/context/ContextMapping.java#L303-L306

more input from someone who knows more would be very welcome, it seems like an easy fix @spinscale?

[2014-11-27 17:42:57,161][DEBUG][action.suggest           ] [Nobilus] [guesswho][0], node[pPxp4aPKSyW9GDmX39YF3Q], [P], s[STARTED]: failed to executed [[[guesswho]], suggestSource[
  {
    "guesswho": {
      "text":"a",
      "completion": {
        "size": "10",
        "field": "guess",
        "context": {
          "sex":"male"
        }
      }
    }
  }]]
org.elasticsearch.ElasticsearchException: failed to execute suggest
    at org.elasticsearch.action.suggest.TransportSuggestAction.shardOperation(TransportSuggestAction.java:166)
    at org.elasticsearch.action.suggest.TransportSuggestAction.shardOperation(TransportSuggestAction.java:61)
    at org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction$1.run(TransportBroadcastOperationAction.java:170)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
    at org.elasticsearch.search.suggest.context.ContextMapping$ContextQuery.toAutomaton(ContextMapping.java:260)
    at org.elasticsearch.search.suggest.completion.AnalyzingCompletionLookupProvider$2.getLookup(AnalyzingCompletionLookupProvider.java:270)
    at org.elasticsearch.search.suggest.completion.Completion090PostingsFormat$CompletionTerms.getLookup(Completion090PostingsFormat.java:297)
    at org.elasticsearch.search.suggest.completion.CompletionSuggester.innerExecute(CompletionSuggester.java:70)
    at org.elasticsearch.search.suggest.completion.CompletionSuggester.innerExecute(CompletionSuggester.java:45)
    at org.elasticsearch.search.suggest.Suggester.execute(Suggester.java:42)
    at org.elasticsearch.search.suggest.SuggestPhase.execute(SuggestPhase.java:85)
    at org.elasticsearch.action.suggest.TransportSuggestAction.shardOperation(TransportSuggestAction.java:161)
    ... 5 more
shoteff commented 9 years ago

I'm using the following groovy script to add "*" to all the fields that are contexts:

if (![ctx._source.contextFieldName].contains('*')) ctx._source.contextFieldName = 
[ctx._source.contextFieldName] + '*'

Replace contextFieldName with your context field As an additional hack I'm also storing all the context field names in the _meta special field, so I can easily read them programatically (in my case the end user should be able to define them when creating the mapping)

sschuerz commented 9 years ago

Thank you very much @shoteff, the script works for me. However, this is a hacky solution and fixing the underlying problem in elasticsearch would be highly appreciated.

areek commented 9 years ago

closing in favour of https://github.com/elasticsearch/elasticsearch/issues/8909