Open mushao999 opened 2 years ago
This issue by the way, may find the answer of a question marked as todo in the PR #25728. Wildcard resolver throw exceptions regarding non wildcarded expressions because SecurityActionFilter rewrite the wildcard to the concreteIndices. @javanna @martijnvg
Pinging @elastic/es-data-management (Team:Data Management)
I think this is the same problem that was investigated from https://github.com/elastic/elasticsearch/issues/45652#issuecomment-534474143 onwards in that issue.
https://github.com/elastic/elasticsearch/issues/47159 was opened as a result of that, but never worked on. It's great that this issue provides and up-to-date way to recreate the problem, as the old way I found to reproduce it is over 2 years old now and might not work any more.
So is there any plan to fix it ? I would like to do some job ,if needed. @droberts195
Thanks @mushao999 for sharing this bug and the reproduction.
After reading and understanding this reproduction, the fact that the wildcard expression is expanded/resolved on the coordinating node (the node that first accepts a request), but actual usage of the resolved indices from the wildcard expression happens on the elected master node can cause error like this one. Essentially the request contains stale information, in this case resolved indices (which were valid on the coordinating node at time the request was accepted), and one of these indices no longer exists when request is being handled on the elected master node (due to the addition and removal of index2
in the background).
Today due to historical reasons, index expression resolution happens in two places. In the security (IndicesAndAliasesResolver
via the SecurityActionFilter
) and in IndexNameExpressionResolver
. IMO this is the cause this bug that you've reported exists. If index expression resolution happened in a single place and on the node where the action performed then I don't think this bug would occur. Also I think this bug isn't tied to get alias api, but also other APIs are prone to it.
I don't know whether there is a quick fix for this bug. Maybe the get alias api (and other indices based APIs) could instead of always throwing a IndexNotFoundException
if an index is missing, ignore that fact that an index is missing if an index originated from a *
expression. Not sure whether such a solution is desirable.
Elasticsearch version (
bin/elasticsearch --version
): 6.3.3, 6.8.0, 7.7.0, 7.10.0, 7.16.1 , 8.1.0(build from latest master branch)Plugins installed: [] with x-apack security enabled, no more plugins
JVM version (
java -version
): follow the mini runtime java version of each versionOS version (
uname -a
if on a Unix-like system): Linux node2 3.10.0-327.ali2019.alios7.x86_64 #1 SMP Sun Jan 19 18:21:42 CST 2020 x86_64 x86_64 x86_64 GNU/LinuxDescription of the problem including expected versus actual behavior: Cluster has an index named
index1
and an alias namedalias1
which points toindex1
. We may have some chance to get aIndexNotFoundException
of an irrelvantindex2
when we callGetAlias API
foralias1
if we are creatingindex2
at the same time.Steps to reproduce:
Please include a minimal but complete recreation of the problem, including (e.g.) index creation, mappings, settings, query etc. The easier you make for us to reproduce it, the more likely that somebody will take the time to look at it.
script2(get alias1 continuously)
Provide logs (if relevant): the stack trace of 8.1.0 list as follow:
Most likey cause: we have analyzed and debuged the code and get the following conclusion:
TransportGetAliasAction
of master node, where they get cluster state to validate index's existence.(java/org/elasticsearch/cluster/metadata/IndexNameExpressionResolver.java:1227)indices
param in the request will be rewritten to a list of all indices from the ClusterState in theSecurityActionFilter
ofcoordinate node
.(org/elasticsearch/xpack/security/authz/IndicesAndAliasesResolver.java:257)two phase cluster state plublication
can make corrdinate nodes' cluster state newer than the master node.Need help Please help reivew our issue and analysis, and give us some suggestion how to fix it.