elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.71k stars 24.67k forks source link

Standard token filter removal causes exceptions after upgrade #50734

Closed matriv closed 4 years ago

matriv commented 4 years ago

The removal of standard token filter in combination with the way the relevant factories are cached causes exceptions to be thrown when trying to query or insert documents to a < 7.0.0 index.

Reproduction steps:

POST /myindex/_mapping/_doc

{ "properties": { "title": { "type": "text", "analyzer": "my_custom_analyzer" } } }


- Upgrade to 7.4.2 and then query the index or insert a doc:

GET /myindex/_search { "query": { "match" : { "title" : "Lala la lalala as a developer adf" } } }

or

POST /myindex/_doc { "title" : "foo bar" }


and exception is thrown:

Caused by: java.lang.IllegalArgumentException: The [standard] token filter has been removed. at org.elasticsearch.indices.analysis.AnalysisModule.lambda$setupPreConfiguredTokenFilters$1(AnalysisModule.java:189) ~[elasticsearch-7.4.2.jar:7.4.2] at org.elasticsearch.index.analysis.PreConfiguredTokenFilter.lambda$singletonWithVersion$2(PreConfiguredTokenFilter.java:66) ~[elasticsearch-7.4.2.jar:7.4.2] at org.elasticsearch.index.analysis.PreConfiguredTokenFilter$1.create(PreConfiguredTokenFilter.java:132) ~[elasticsearch-7.4.2.jar:7.4.2] at org.elasticsearch.index.analysis.CustomAnalyzer.createComponents(CustomAnalyzer.java:92) ~[elasticsearch-7.4.2.jar:7.4.2] at org.apache.lucene.analysis.AnalyzerWrapper.createComponents(AnalyzerWrapper.java:136) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56] at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:199) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56] at org.elasticsearch.index.search.MatchQuery$MatchQueryBuilder.createQuery(MatchQuery.java:497) ~[elasticsearch-7.4.2.jar:7.4.2] at org.elasticsearch.index.search.MatchQuery$MatchQueryBuilder.createFieldQuery(MatchQuery.java:386) ~[elasticsearch-7.4.2.jar:7.4.2] at org.apache.lucene.util.QueryBuilder.createBooleanQuery(QueryBuilder.java:96) ~[lucene-core-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:05:56] at org.elasticsearch.index.search.MatchQuery.parseInternal(MatchQuery.java:289) ~[elasticsearch-7.4.2.jar:7.4.2] at org.elasticsearch.index.search.MatchQuery.parse(MatchQuery.java:281) ~[elasticsearch-7.4.2.jar:7.4.2] at org.elasticsearch.index.query.MatchQueryBuilder.doToQuery(MatchQueryBuilder.java:426) ~[elasticsearch-7.4.2.jar:7.4.2] at org.elasticsearch.index.query.AbstractQueryBuilder.toQuery(AbstractQueryBuilder.java:99) ~[elasticsearch-7.4.2.jar:7.4.2] at org.elasticsearch.index.query.QueryShardContext.lambda$toQuery$1(QueryShardContext.java:305) ~[elasticsearch-7.4.2.jar:7.4.2] at org.elasticsearch.index.query.QueryShardContext.toQuery(QueryShardContext.java:317) ~[elasticsearch-7.4.2.jar:7.4.2] ... 17 more


The exception is gone if the es node is restarted once again (after the upgrade to >= 7).
It's caused by the way the `Analysis#setupPreConfiguredTokenFilters` registers in the cache using the `PreConfiguredTokenFilter#singletonWithVersion`. The strategy used is `ONE` so there is only one factory and not one per version. So when the node starts for the first time in >= 7 a bunch of new internal indices are created:

[2020-01-07T18:43:52,363][INFO ][o.e.c.m.MetaDataIndexTemplateService] [matriv] adding template [.watch-history-10] for index patterns [.watcher-history-10] [2020-01-07T18:43:52,364][WARN ][o.e.c.s.MasterService ] [matriv] took [43.8s], which is over [10s], to compute cluster state update for [create-index-template [.watch-history-10], cause [api]] [2020-01-07T18:43:55,023][INFO ][o.e.c.m.MetaDataIndexTemplateService] [matriv] adding template [.slm-history] for index patterns [.slm-history-1] [2020-01-07T18:43:59,344][INFO ][o.e.x.i.a.TransportPutLifecycleAction] [matriv] adding index lifecycle policy [watch-history-ilm-policy] [2020-01-07T18:43:59,467][INFO ][o.e.x.i.a.TransportPutLifecycleAction] [matriv] adding index lifecycle policy [slm-history-ilm-policy] [2020-01-07T18:43:59,734][INFO ][o.e.c.r.a.AllocationService] [matriv] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[myindex][0]]]).


Of course those have index creation version 7.x.x and so the `TokenFilterFactory` is registered once with version 7.x.x. When our data index `myindex` gets processed it uses the 7.x.x as version (because due to the `ONE` caching strategy there is no other instanced cache with version 6.x.x) and so the code below:

PreConfiguredTokenFilter.singletonWithVersion("standard", true, (reader, version) -> { if (version.before(Version.V_7_0_0)) { deprecationLogger.deprecatedAndMaybeLog("standard_deprecation", "The [standard] token filter is deprecated and will be removed in a future version."); } else { throw new IllegalArgumentException("The [standard] token filter has been removed."); } return reader; }));

elasticmachine commented 4 years ago

Pinging @elastic/es-search (:Search/Analysis)

matriv commented 4 years ago

master : 24e1858a70bd255ebc210415acaac1bfb40340d3 7.x : fda25ed04a5510314a9c6a830475d27c32fa59e0 7.6 : b65e29337d0370ec1c51060c338238eea5a04d80 7.5 : 3055eeef8549fe0dcef80b8698e1aa79c1d5bb9f