Closed polyfractal closed 8 years ago
It's a little bit more fun than that, even: you actually get partial indexing!
curl -XDELETE localhost:9200/testindex
curl -XPUT localhost:9200/testindex
curl -XPOST localhost:9200/testindex/testtype -d '{"leftkey":"value","_id":{"name":"polyfractal"},"rightkey":"value"}}}'
curl -XPOST localhost:9200/_flush
Now search on the field before the _id:
curl -XGET localhost:9200/testindex/testtype/_search?pretty -d '{"query":{"term":{"leftkey":"value"}}}'
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.30685282,
"hits" : [ {
"_index" : "testindex",
"_type" : "testtype",
"_id" : "PalIN5CpSPKkGbhs4qNqaw",
"_score" : 0.30685282, "_source" : {"leftkey":"value","_id":{"name":"polyfractal"},"rightkey":"value"}}}
} ]
}
}
There you go. But search on the field after the _id:
curl -XGET localhost:9200/testindex/testtype/_search?pretty -d '{"query":{"term":{"rightkey":"value"}}}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
And you get nothing.
I am affected by this behavior too, monogo output the field like this
{ "_id":{"$oid":"54d9e3bf30320c3335017e69"}, "@timestamp":"..."}
actually I did not care about the "_id" field, but I care about the "@timestamp" field which is silently not indexed. Here an example that shows the behavior: https://gist.github.com/andreaskern/01d1d292f7f146186ee5
In 2.0, the timestamp field would now be indexed correctly, as would _id.$oid
. Wondering if we should allow users to index _id
field inside the body at all? /cc @rjernst
The ability to specify _id within a document has already been removed for 2.0+ indexes.
@rjernst you removed the ability to specify the main doc _id in the body, but if the body contains an _id
field then it creates a field called _id
in the mapping, which can't be queried.
What I'm asking is: should we just ignore the fact that this field is not accessible (as we do in master today) or should we actually throw an exception? I'm leaning towards ignoring, as users don't always have control over the docs they receive.
I would be in favor of throwing an exception. This would only be for 2.0+ indexes, and it is really just field name validation (disallowing fields colliding with meta fields). The mechanism would be the same, a user would not be able to explicitly add a field _id
in the properties for a document type.
@rjernst it's a tricky one. eg mongo adds { "_id": { "$oid": "...." }}
, so actually the _id.$oid
field IS queryable... should this still throw an exception?
IMO, yes.
With #8871, I don't think that would work, because _id is both a field mapper (the real meta field), and an object mapper.
@rjernst yep, makes sense
@rjernst this still works, even with #8871 merged in
Closed by #14003
Expected Behavior
Normally, if you try to index a document without an ID in the URI (e.g. a POST) but with an _id field in the document (and no explicit _id path mapping), it throws an error because the autogenerated ID does not match the provided _id field:
Broken Behavior
However, if the _id field happens to be an object, Elasticsearch happily indexes the document:
You can GET it:
It shows up with a match_all query:
But doesn't show up when you search for exact values (or Match or any other search):
If you ask ES why it doesn't show up, it says there are no matching terms:
And finally, as a fun twist, you can set an explicit mapping to look inside the _id object. This works with regard to the ID (it extracts the appropriate ID), is GETable, match_all, etc. Search is still broken.
Reference
This was surfaced by Scott on the mailing list.