Closed nomoa closed 7 years ago
@clintongormley thanks but I'm still confused...
default
is not a builtin similarity and apparently I can create an index that overrides 'index.similarity.default'base
and default
.s1monw see PR. Not sure if this is what you had in mind. The commit adds the ability to define a index.similarity.default.type and prevents users from redefining any built-in similarity types.
My understanding of all of this is that
default
similarity and it will affect all the fields with the name default
base
.So I conclude that the migration script should not flag an index that defines a similarity named default
as problematic.
Thanks for your help
heya @nomoa - let me find out more and get back to you
My interpretation of the problem is that the following sequence will occur:
Imho this seems like a bug in the 2->5 index migration process. Tested by migrating between 2.3.5 and 5.1.2.
EDIT: This seems to be slightly more complicated than above. I can trigger the problem from my application with a fully configured index, but a trying to develop a toy case with 1 field isn't showing the same problem. Will develop repro case.
I found reproduced the problem, it's actually more unexpected. It looks like the _mapping endpoint is reporting the field as using classic similarity, but grabbing an explanation of the query result shows the parameters used for bm25. Tested against 5.1.2 and 5.2.0.
Reproduction script:
curl -XDELETE localhost:9200/test_bm25_upgrade
curl -XPUT localhost:9200/test_bm25_upgrade -d '{
"settings": {
"index": {
"similarity": {
"default": {
"type": "BM25"
}
}
}
},
"mappings": {
"page": {
"properties": {
"title": {
"type": "string",
"similarity": "default"
}
}
}
}
}'
echo
curl -XPOST localhost:9200/test_bm25_upgrade/page/1 -d '{"title": "testing bm25 upgrade"}'
echo
curl localhost:9200/test_bm25_upgrade/_refresh
# Note this contains k1 and b parameters, indicating usage of bm25
curl -s 'localhost:9200/test_bm25_upgrade/_search?q=bm25&explain=1' | jq '.hits.hits | map(._explanation.details[0].details[1].details)'
# Update to 5.x
sudo service elasticsearch stop
dpkg -i elasticsearch-5.1.2.deb
# Check the new mapping, title similarity is reported as classic
curl localhost:9200/test_bm25_upgrade/_mapping | jq .test_bm25_upgrade.mappings.page.properties.title.similarity
# Interestingly performing a search still appears to use bm25, by presence of k1 and b parameters instead of tf:
curl 'localhost:9200/test_bm25_upgrade/_search?q=bm25&explain=1' | jq '.hits.hits | map(._explanation.details[0].details[1].details)'
Interestingly performing a search still appears to use bm25, by presence of k1 and b parameters instead of tf:
That's because you're querying the _all
field, not the title
field.
In https://github.com/elastic/elasticsearch-migration/blob/2.x/src/migration/IndexSettings.js#L96 the 'default' similarity is flagged as a built-in Similarity. According to the doc overriding 'default' is the way to configure field-agnostic features such as coord. What are the reasons to mark this kind of config as problematic?