Closed bigerock closed 9 years ago
This code https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/search/suggest/context/ContextMapping.java#L300 expects to receive a query for each context, but doesn't have a default value if a context is missing, or doesn't throw an error if a context is missing.
I assume that contexts are required? No way to handle the "all" case for a particular context?
are you saying it needs a query? i'm not following.
This looks like a bug indeed, as the mapping allows you to specify defaults, you should be able to so specify something like "location": {}
in your query to make sure it uses the default location in the mapping. Alternatively an empty context in a query should fallback to the defaults and not raise NPEs
POST services/_suggest?pretty'
{
"suggest" : {
"text" : "m",
"completion" : {
"field" : "suggest_field",
"size": 10,
"context": {
}
}
}
}
I had a very similar issue, and passing in an empty object for any context items that I did not want to specify got rid of the NPE error.
I also think this should be considered a bug. According to the ES documentation for the category query: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/suggester-context.html#_category_query
A query within a category works similar to the configuration. If the value is null the mappings default categories will be used. Otherwise the suggestion takes place for all documents that have at least one category in common with the query.
It works if the context value is explicitly set to null, but it doesn't work if the context value is not provided at all (even if there is default value in the mapping)
This works:
{
"suggest": {
"text": "a",
"completion": {
"field": "my_autocomplete",
"size": 10,
"context": {
"_type" : "tweet",
"additional_name": null
}
}
}
}
And this doesn't:
{
"suggest": {
"text": "a",
"completion": {
"field": "my_autocomplete",
"size": 10,
"context": {
"_type" : "tweet"
}
}
}
}
FWIW I agree that this functionality should be changed / fixed.
ES allows you to define completion suggesters, defined by "type": "completion"
in your mapping, which contain context suggesters which can be either 'category' or 'geo'. These provide a means of reducing the documents returned by the suggester for the given input text.
You can optionally provide a default
value for each context suggester, which is only used; at index time; when the document you send to be indexed is missing a value for that context.
All of the defined contexts must be specified when sending a query to /_suggest
. Failing to do so will result in the cryptic error ElasticsearchException[failed to execute suggest]; nested: NullPointerException;
. ..as per @spinscale comment
Some people above commented about the default
value, which has nothing to do with the query, don't confuse this with omitting the context in the query which is what I believe everyone really wants.
@shoteff if you read that passage again it refers to index time, not to query time.
I'll give you my use-case to give some context about how I would like to use it.
We are indexing 50M places of interest around the globe, and for each I would like to have 1x 'category' context for its 'class' (think restaurant/bar/cafe) in addition to the 1x 'geo' context used to target geo-relevant documents.
Who doesn't want autocomplete for all the bars near their house? am I right?
Problem is.. once I introduce the 'class' context I can no longer query for 'all' documents in the UK. I am now forced to specify a 'class' on every single request, or worse, send a list of every class with every query. Sending 'key:null' as per previous comments does not remedy this issue.
FWIW: The problem defined above also applies to using the 'geo' context on it's own and not specifying a lat/lon at query time, which means you can never query at precision:0
, and can never have autocomplete for the whole globe.
I put together a testcase for you to explain what's going on, it's the game 'Guess Who?' which you should all remember from your youth. I hope you enjoy @clintongormley, @markharwood ;)
#!/bin/bash
################################################
# Let's play Guess Who? in Elasticsearch!!
################################################
ES='localhost:9200';
# drop index
curl -XDELETE "$ES/guesswho?pretty=true" 2>/dev/null;
# create index
curl -XPUT "$ES/guesswho?pretty=true" -d'
{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 0
}
},
"mappings": {
"character": {
"properties": {
"name": { "type": "string" },
"sex": { "type": "string" },
"hair": { "type": "string" },
"eyes": { "type": "string" },
"wears": { "type": "string" },
"guess": {
"type": "completion",
"payloads": "false",
"preserve_separators": "false",
"preserve_position_increments": "false",
"max_input_length": "50",
"context": {
"sex": {
"type": "category",
"path": "sex",
"default": "unknown"
},
"hair": {
"type": "category",
"path": "hair",
"default": "unknown"
},
"eyes": {
"type": "category",
"path": "eyes",
"default": "unknown"
},
"wears": {
"type": "category",
"path": "wears",
"default": "unknown"
}
}
}
}
}
}
}';
# Index some characters
index_character() {
curl -XPOST "$ES/guesswho/character/$1?pretty=true" -d'
{
"name":"'"$1"'",
"sex":"'"$2"'",
"hair":"'"$3"'",
"eyes":"'"$4"'",
"wears":"'"$5"'",
"guess": {
"input": [ "'"$1"'" ],
"output": "'"$1 is $2 with $3 hair, $4 eyes and wears $5"'"
}
}';
}
# src: http://www.buzzfeed.com/kenblom/the-game-of-guess-who-and-their-celebrity-dopplgangers
index_character "bernard" "male" "red" "red" "a fedora";
index_character "anne" "female" "black" "red" "earrings";
index_character "eric" "male" "blond" "yellow" "a sailor hat";
index_character "george" "male" "grey" "red" "a detective hat";
index_character "herman" "male" "blond" "red" "nothing";
index_character "alfred" "male" "blond" "blue" "frown";
index_character "mystery" "unknown" "unknown" "unknown" "unknown";
# Refresh index
curl -XPOST "$ES/guesswho/_refresh";
# The fun is trying to guess when you play Guess Who?
guess_character_without_context() {
curl -XPOST "$ES/guesswho/_suggest?pretty=true" -d'
{
"guesswho": {
"text":"'"$1"'",
"completion": {
"size": "10",
"field": "guess"
}
}
}';
}
guess_character_by_sex() {
curl -XPOST "$ES/guesswho/_suggest?pretty=true" -d'
{
"guesswho": {
"text":"'"$1"'",
"completion": {
"size": "10",
"field": "guess",
"context": {
"sex":"'"$2"'"
}
}
}
}';
}
guess_character_by_specifying_everything() {
curl -XPOST "$ES/guesswho/_suggest?pretty=true" -d'
{
"guesswho": {
"text":"'"$1"'",
"completion": {
"size": "10",
"field": "guess",
"context": {
"sex":"'"$2"'",
"hair":"'"$3"'",
"eyes":"'"$4"'",
"wears":"'"$5"'"
}
}
}
}';
}
guess_character_without_context "a";
# ElasticsearchIllegalArgumentException[suggester [completion] requires context to be setup]
guess_character_by_sex "a" "male";
# ElasticsearchException[failed to execute suggest]; nested: NullPointerException;
guess_character_by_specifying_everything "a" "male" "blond" "blue" "frown";
# I win, play again?
@missinglink thank you, I think I messed up the index and query time stuff, but fortunately you got the idea :)
cc/ @chilling @areek
So what is the value that I have to set if I don't want restrict the context for a particular field in "context"? Using null only gives me the documents that really have null for this field. Using an empty object results in a NullPointerException. Does anyone have a solution for this problem? I use elasticsearch 1.4.
Hey @sschuerz there are currently no 'solutions' for this issue, it will require a change to the codebase to fix.
Two absolute hacks come to mind; which I feel dirty for mentioning, but I will anyway.
'*'
(or any other token) to every 'category' context for every record and then use that as your catchall./_suggest
.They are both awful hacks but should work, I am currently considering whether to consider consideration of them myself.
If you are using the 'geo' context then you're unfortunately out of luck, I don't think there is a hack to allow you to query the whole planet.
Actually I'm using the 1-st scenario, using a script I'm adding '*' to the list of the categories, so I can always have it as catchall
This is not an acceptable solution to this issue, but I appreciate that we both came up with the same hacky workaround :)
Thank you for the responses, @missinglink and @shoteff. I think I'll use the first scenario because collecting a list is very unhandy in my context. @shoteff: Could you please provide an example for such a script?
I think what @shoteff is referring to is below, which I would like to re-iterate is a hack for the following 3 reasons:
index_character() {
curl -XPOST "$ES/guesswho/character/$1?pretty=true" -d'
{
"name":["*","'"$1"'"],
"sex":["*","'"$2"'"],
"hair":["*","'"$3"'"],
"eyes":["*","'"$4"'"],
"wears":["*","'"$5"'"],
"guess": {
"input": [ "'"$1"'" ],
"output": "'"$1 is $2 with $3 hair, $4 eyes and wears $5"'"
}
}';
}
guess_character_by_hack() {
curl -XPOST "$ES/guesswho/_suggest?pretty=true" -d'
{
"guesswho": {
"text":"'"$1"'",
"completion": {
"size": "10",
"field": "guess",
"context": {
"sex":"'"$2"'",
"hair":"*",
"eyes":"*",
"wears":"*"
}
}
}
}';
}
My java is terrible, the java.lang.NullPointerException
seems to be caused here: https://github.com/elasticsearch/elasticsearch/blob/8aebb9656b63148190076bd8cdc5be0e87a9923d/src/main/java/org/elasticsearch/search/suggest/context/ContextMapping.java#L286
more specifically.. https://github.com/elasticsearch/elasticsearch/blob/8aebb9656b63148190076bd8cdc5be0e87a9923d/src/main/java/org/elasticsearch/search/suggest/context/ContextMapping.java#L303-L306
more input from someone who knows more would be very welcome, it seems like an easy fix @spinscale?
[2014-11-27 17:42:57,161][DEBUG][action.suggest ] [Nobilus] [guesswho][0], node[pPxp4aPKSyW9GDmX39YF3Q], [P], s[STARTED]: failed to executed [[[guesswho]], suggestSource[
{
"guesswho": {
"text":"a",
"completion": {
"size": "10",
"field": "guess",
"context": {
"sex":"male"
}
}
}
}]]
org.elasticsearch.ElasticsearchException: failed to execute suggest
at org.elasticsearch.action.suggest.TransportSuggestAction.shardOperation(TransportSuggestAction.java:166)
at org.elasticsearch.action.suggest.TransportSuggestAction.shardOperation(TransportSuggestAction.java:61)
at org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction$1.run(TransportBroadcastOperationAction.java:170)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at org.elasticsearch.search.suggest.context.ContextMapping$ContextQuery.toAutomaton(ContextMapping.java:260)
at org.elasticsearch.search.suggest.completion.AnalyzingCompletionLookupProvider$2.getLookup(AnalyzingCompletionLookupProvider.java:270)
at org.elasticsearch.search.suggest.completion.Completion090PostingsFormat$CompletionTerms.getLookup(Completion090PostingsFormat.java:297)
at org.elasticsearch.search.suggest.completion.CompletionSuggester.innerExecute(CompletionSuggester.java:70)
at org.elasticsearch.search.suggest.completion.CompletionSuggester.innerExecute(CompletionSuggester.java:45)
at org.elasticsearch.search.suggest.Suggester.execute(Suggester.java:42)
at org.elasticsearch.search.suggest.SuggestPhase.execute(SuggestPhase.java:85)
at org.elasticsearch.action.suggest.TransportSuggestAction.shardOperation(TransportSuggestAction.java:161)
... 5 more
I'm using the following groovy script to add "*" to all the fields that are contexts:
if (![ctx._source.contextFieldName].contains('*')) ctx._source.contextFieldName =
[ctx._source.contextFieldName] + '*'
Replace contextFieldName with your context field As an additional hack I'm also storing all the context field names in the _meta special field, so I can easily read them programatically (in my case the end user should be able to define them when creating the mapping)
Thank you very much @shoteff, the script works for me. However, this is a hacky solution and fixing the underlying problem in elasticsearch would be highly appreciated.
closing in favour of https://github.com/elasticsearch/elasticsearch/issues/8909
running the example given in the documentation is throwing an error:
{ "_shards": { "total": 5, "successful": 4, "failed": 1, "failures": [ { "index": "services", "shard": 2, "reason": "BroadcastShardOperationFailedException[[services][2] ]; nested: ElasticsearchException[failed to execute suggest]; nested: NullPointerException; " } ] } }
here's the code example given:
PUT /services
PUT /services/service/_mapping { "service": { "properties": { "name": { "type" : "string" }, "tag": { "type" : "string" }, "suggest_field": { "type": "completion", "context": { "color": { "type": "category", "path": "color_field", "default": ["red", "green", "blue"] }, "location": { "type": "geo", "precision": "5m", "neighbors": true, "default": "u33" } } } } } }
PUT services/service/1 { "name": "knapsack", "suggest_field": { "input": ["knacksack", "backpack", "daypack"], "context": { "color": ["red", "yellow"] } } }
POST services/_suggest?pretty' { "suggest" : { "text" : "m", "completion" : { "field" : "suggest_field", "size": 10, "context": { "color": "red" } } } }