Closed mfussenegger closed 8 years ago
"settings": { "number_of_shards": 2 },
This is enough to do it. If you want consistent scoring, you need to enable distributed term statistics, otherwise the IDF values used are based on local information, not global.
See https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-search-type.html
But shouldn't the local IDF values be deterministic if _id isn't randomly generated? The shard allocation should be deterministic which should result in the shards/lucene-indices having the same documents.
You're right in that if I change number_of_shards to 1 I get the same results in both 1.7 and 2.1 - could it be that the routing allocation algorithm changed?
Okay, thanks for the pointer in the right direction. Murmur is now used - it's even documented in the mapping changes. Due to that the distribution is different from before and due to that the scoring is different.
We're in the progress of upgrading to ES 2.1 and have noticed that some queries now have different results. I'm trying to figure out the root cause. So far my guess is that it is a analyzer change within Lucene.
Here is the mapping I'm using:
Here are the records:
And this is the query:
In 1.7 the top 2 hits are
and in 2.1 they are:
I've also tried to do a snapshot on 1.7 and then restore the snapshot in 2.1. This results in 2.1 producing the same result as in 1.7 which is why I assume that something at indexing time is now handled differently.
I've also tried to see if the
_analyze
API returns a different result somewhere. But all descriptions are tokenized the same in 1.7 and 2.1 - the only difference is that in one version position is starting from 0 and in the other version it's starting from 1.Maybe I'm missing something obvious here,