elastic / elasticsearch-migration

This plugin will help you to check whether you can upgrade directly to the next major version of Elasticsearch, or whether you need to make changes to your data and cluster before doing so.
290 stars 32 forks source link

According to documention 'default' similarity can be overridden #103

Closed nomoa closed 7 years ago

nomoa commented 7 years ago

In https://github.com/elastic/elasticsearch-migration/blob/2.x/src/migration/IndexSettings.js#L96 the 'default' similarity is flagged as a built-in Similarity. According to the doc overriding 'default' is the way to configure field-agnostic features such as coord. What are the reasons to mark this kind of config as problematic?

clintongormley commented 7 years ago

See https://github.com/elastic/elasticsearch/issues/16594 and https://github.com/elastic/elasticsearch/commit/474fd2607346308674f65ce336af8fc3c125b53c

nomoa commented 7 years ago

@clintongormley thanks but I'm still confused...

My understanding of all of this is that

So I conclude that the migration script should not flag an index that defines a similarity named default as problematic.

Thanks for your help

clintongormley commented 7 years ago

heya @nomoa - let me find out more and get back to you

ebernhardson commented 7 years ago

My interpretation of the problem is that the following sequence will occur:

Imho this seems like a bug in the 2->5 index migration process. Tested by migrating between 2.3.5 and 5.1.2.

EDIT: This seems to be slightly more complicated than above. I can trigger the problem from my application with a fully configured index, but a trying to develop a toy case with 1 field isn't showing the same problem. Will develop repro case.

ebernhardson commented 7 years ago

I found reproduced the problem, it's actually more unexpected. It looks like the _mapping endpoint is reporting the field as using classic similarity, but grabbing an explanation of the query result shows the parameters used for bm25. Tested against 5.1.2 and 5.2.0.

Reproduction script:

curl -XDELETE localhost:9200/test_bm25_upgrade
curl -XPUT localhost:9200/test_bm25_upgrade -d '{
  "settings": {
    "index": {
      "similarity": {
        "default": {
          "type": "BM25"
        }
      }
    }
  },
  "mappings": {
    "page": {
      "properties": {
        "title": {
          "type": "string",
          "similarity": "default"
        }
      }
    }
  }
}'
echo

curl -XPOST localhost:9200/test_bm25_upgrade/page/1 -d '{"title": "testing bm25 upgrade"}'
echo
curl localhost:9200/test_bm25_upgrade/_refresh

# Note this contains k1 and b parameters, indicating usage of bm25
curl -s 'localhost:9200/test_bm25_upgrade/_search?q=bm25&explain=1' | jq '.hits.hits | map(._explanation.details[0].details[1].details)'

# Update to 5.x
sudo service elasticsearch stop
dpkg -i elasticsearch-5.1.2.deb

# Check the new mapping, title similarity is reported as classic
curl localhost:9200/test_bm25_upgrade/_mapping | jq .test_bm25_upgrade.mappings.page.properties.title.similarity

# Interestingly performing a search still appears to use bm25, by presence of k1 and b parameters instead of tf:
curl 'localhost:9200/test_bm25_upgrade/_search?q=bm25&explain=1' | jq '.hits.hits | map(._explanation.details[0].details[1].details)'
clintongormley commented 7 years ago

Interestingly performing a search still appears to use bm25, by presence of k1 and b parameters instead of tf:

That's because you're querying the _all field, not the title field.

clintongormley commented 7 years ago

Fixed in https://github.com/elastic/elasticsearch/issues/23163