Closed kingherc closed 9 months ago
Pinging @elastic/es-data-management (Team:Data Management)
This is an interesting one.
^ here's the code in question. We're calling GET _cat/indices?v
and then the idea is to parse the output to see if the index we're expecting to exist does actually exist (i.e. the index called testrollover-000002
).
But we get an error response!
org.elasticsearch.xpack.restart.CoreFullClusterRestartIT > testRollover {cluster=OLD} FAILED
org.elasticsearch.client.ResponseException: method [GET], host [http://[::1]:40047], URI [/_cat/indices?v], status line [HTTP/1.1 404 Not Found]
{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [.geoip_databases]","index_uuid":"_na_","resource.type":"index_or_alias","resource.id":".geoip_databases","index":".geoip_databases"}],"type":"index_not_found_exception","reason":"no such index [.geoip_databases]","index_uuid":"_na_","resource.type":"index_or_alias","resource.id":".geoip_databases","index":".geoip_databases"},"status":404}
at __randomizedtesting.SeedInfo.seed([4B35DA6793FFAAD3:C028D71F7D53C62C]:0)
[...]
And here's a snippet of logs:
[2023-07-19T06:07:57,831][INFO ][o.e.i.g.GeoIpDownloader ] [test-cluster-1] updating geoip databases
[2023-07-19T06:07:57,831][INFO ][o.e.i.g.GeoIpDownloader ] [test-cluster-1] fetching geoip databases overview from [https://geoip.elastic.co/v1/database?elastic_geoip_service_tos=agree]
[2023-07-19T06:07:58,228][INFO ][o.e.c.m.MetadataCreateIndexService] [test-cluster-0] [testrollover-000002] creating index, cause [rollover_index], templates [], shards [1]/[1]
[2023-07-19T06:07:58,388][INFO ][o.e.c.m.MetadataCreateIndexService] [test-cluster-0] [.geoip_databases] creating index, cause [auto(bulk api)], templates [], shards [1]/[0]
[2023-07-19T06:07:58,389][INFO ][o.e.c.r.a.AllocationService] [test-cluster-0] updating number_of_replicas to [1] for indices [.geoip_databases]
[2023-07-19T06:07:58,477][INFO ][o.e.x.r.CoreFullClusterRestartIT] [testRollover] [cluster=OLD] after test
At this stage of things, I don't think that the .geoip_databases
index is notably special -- sure it's a system index and it's behind an alias (IIRC), but the part that worries me here is that we're asking Elasticsearch for GET _cat/indices
output and we're getting an error because of some kind of internal race condition around index creation.
Okay, I reproduced this on a 3 node 7.17.2 cluster. Here's the stack trace:
{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [2cac048b854d86e9e7706fa7]","index_uuid":"_na_","resource.type":"index_or_alias","resource.id":"2cac048b854d86e9e7706fa7","index":"2cac048b854d86e9e770[235/6125]
ck_trace":"[2cac048b854d86e9e7706fa7] IndexNotFoundException[no such index [2cac048b854d86e9e7706fa7]]
at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$WildcardExpressionResolver.indexNotFoundException(IndexNameExpressionResolver.java:1250)
at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$WildcardExpressionResolver.innerResolve(IndexNameExpressionResolver.java:1188)
at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$WildcardExpressionResolver.resolve(IndexNameExpressionResolver.java:1144)
at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndices(IndexNameExpressionResolver.java:292)
at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndexNames(IndexNameExpressionResolver.java:270)
at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndexNames(IndexNameExpressionResolver.java:92)
at org.elasticsearch.action.admin.indices.settings.get.TransportGetSettingsAction.checkBlock(TransportGetSettingsAction.java:64)
at org.elasticsearch.action.admin.indices.settings.get.TransportGetSettingsAction.checkBlock(TransportGetSettingsAction.java:31)
at org.elasticsearch.action.support.master.TransportMasterNodeAction.checkBlockIfStateRecovered(TransportMasterNodeAction.java:138)
at org.elasticsearch.action.support.master.TransportMasterNodeAction.access$000(TransportMasterNodeAction.java:52)
at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction.doStart(TransportMasterNodeAction.java:185)
at org.elasticsearch.action.support.master.TransportMasterNodeAction.doExecute(TransportMasterNodeAction.java:158)
at org.elasticsearch.action.support.master.TransportMasterNodeAction.doExecute(TransportMasterNodeAction.java:52)
at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:179)
at org.elasticsearch.action.support.ActionFilter$Simple.apply(ActionFilter.java:53)
at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:177)
at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.lambda$applyInternal$3(SecurityActionFilter.java:190)
at org.elasticsearch.action.ActionListener$DelegatingFailureActionListener.onResponse(ActionListener.java:219)
at org.elasticsearch.xpack.security.authz.AuthorizationService$1.onResponse(AuthorizationService.java:577)
at org.elasticsearch.xpack.security.authz.AuthorizationService$1.onResponse(AuthorizationService.java:571)
at org.elasticsearch.xpack.security.authz.interceptor.FieldAndDocumentLevelSecurityRequestInterceptor.intercept(FieldAndDocumentLevelSecurityRequestInterceptor.java:85)
at org.elasticsearch.xpack.security.authz.interceptor.SearchRequestInterceptor.intercept(SearchRequestInterceptor.java:26)
at org.elasticsearch.xpack.security.authz.AuthorizationService$1.onResponse(AuthorizationService.java:575)
at org.elasticsearch.xpack.security.authz.AuthorizationService$1.onResponse(AuthorizationService.java:571)
I caught it by spinning up a cluster then running these two commands in different terminals:
while true; do index=$(openssl rand -hex 12); curl $ES_CREDS -XPUT $ES_URL/$index; echo; sleep 1; curl $ES_CREDS -XDELETE $ES_URL/$index; echo; done
while true; do curl $ES_CREDS "$ES_URL/_cat/indices?v&error_trace"; echo; done
Here's a 8.11.3 stack trace -- I was hoping maybe https://github.com/elastic/elasticsearch/pull/93437 would have invalidated the circumstances where this bug happened (🤷) but no such luck.
{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [77af693ebd14b44e71d1ddaf]","index_uuid":"_na_","resource.type":"index_or_alias","resource.id":"77af693ebd14b44e71d1ddaf","index":"77af693ebd14b44e71d1ddaf","stack_trace":"[77af693ebd14b44e71d1ddaf] org.elasticsearch.index.IndexNotFoundException: no such index [77af693ebd14b44e71d1ddaf]
at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.notFoundException(IndexNameExpressionResolver.java:474)
at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$ExplicitResourceNameFilter.ensureAliasOrIndexExists(IndexNameExpressionResolver.java:1604)
at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$ExplicitResourceNameFilter.filterUnavailable(IndexNameExpressionResolver.java:1584)
at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.resolveExpressions(IndexNameExpressionResolver.java:266)
at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndices(IndexNameExpressionResolver.java:341)
at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndexNames(IndexNameExpressionResolver.java:332)
at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndexNames(IndexNameExpressionResolver.java:91)
at org.elasticsearch.action.admin.indices.settings.get.TransportGetSettingsAction.checkBlock(TransportGetSettingsAction.java:70)
at org.elasticsearch.action.admin.indices.settings.get.TransportGetSettingsAction.checkBlock(TransportGetSettingsAction.java:37)
at org.elasticsearch.action.support.master.TransportMasterNodeAction.checkBlockIfStateRecovered(TransportMasterNodeAction.java:136)
at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction.doStart(TransportMasterNodeAction.java:201)
at org.elasticsearch.action.support.master.TransportMasterNodeAction.doExecute(TransportMasterNodeAction.java:173)
at org.elasticsearch.action.support.master.TransportMasterNodeAction.doExecute(TransportMasterNodeAction.java:55)
at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:87)
at org.elasticsearch.action.support.ActionFilter$Simple.apply(ActionFilter.java:53)
at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:85)
At least on 8.11.3 it affects _cat/shards
, too (I assume the same applies to previous versions).
{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [47489aeb9ba1353816a4d4ab]","resource.type":"index_or_alias","resource.id":"47489aeb9ba1353816a4d4ab","index_uuid":"_na_","index":"47489aeb9ba1353816a4d4ab","stack_trace":"[47489aeb9ba1353816a4d4ab] org.elasticsearch.index.IndexNotFoundException: no such index [47489aeb9ba1353816a4d4ab]
at org.elasticsearch.server@8.11.3/org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.notFoundException(IndexNameExpressionResolver.java:474)
at org.elasticsearch.server@8.11.3/org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$ExplicitResourceNameFilter.ensureAliasOrIndexExists(IndexNameExpressionResolver.java:1604)
at org.elasticsearch.server@8.11.3/org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$ExplicitResourceNameFilter.filterUnavailable(IndexNameExpressionResolver.java:1584)
at org.elasticsearch.server@8.11.3/org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.resolveExpressions(IndexNameExpressionResolver.java:266)
at org.elasticsearch.server@8.11.3/org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndices(IndexNameExpressionResolver.java:341)
at org.elasticsearch.server@8.11.3/org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndexNames(IndexNameExpressionResolver.java:332)
at org.elasticsearch.server@8.11.3/org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndexNames(IndexNameExpressionResolver.java:91)
at org.elasticsearch.server@8.11.3/org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction.resolveConcreteIndexNames(TransportBroadcastByNodeAction.java:225)
at org.elasticsearch.server@8.11.3/org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction.doExecuteForked(TransportBroadcastByNodeAction.java:244)
at org.elasticsearch.server@8.11.3/org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction.lambda$doExecute$1(TransportBroadcastByNodeAction.java:232)
at org.elasticsearch.server@8.11.3/org.elasticsearch.action.ActionListener.run(ActionListener.java:368)
at org.elasticsearch.server@8.11.3/org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:96)
at org.elasticsearch.server@8.11.3/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:983)
at org.elasticsearch.server@8.11.3/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1583)
I opened https://github.com/elastic/elasticsearch/issues/104371 with the relevant details from this investigation, so now I don't feel bad working around that bug in this one test. Fix for this ticket coming right up.
CI Link
https://gradle-enterprise.elastic.co/s/3cqzv26af7vyq
Repro line
./gradlew ':x-pack:qa:full-cluster-restart:v7.17.12#bwcTest' -Dtests.class="org.elasticsearch.xpack.restart.CoreFullClusterRestartIT" -Dtests.method="testRollover {cluster=OLD}" -Dtests.seed=4B35DA6793FFAAD3 -Dtests.bwc=true -Dtests.locale=ca-ES -Dtests.timezone=America/Rankin_Inlet -Druntime.java=20
Does it reproduce?
Didn't try
Applicable branches
8.8
Failure history
No response
Failure excerpt