Closed psanz-estc closed 2 weeks ago
Pinging @elastic/es-distributed (Team:Distributed)
@psanz-estc may I know how you getting this?
Create/configure a FS repo with an empty path.repo defined
According to the code fs path resolution is happening using path.repo
settings regardless absolute
or relative
path is specified, meaning to define fs repo you have to set correct path.repo
settings in the first place.
Yeah, we need non-empty path.repo
to create a repository. However, the error can happen when the node restarts with the path.repo
setting removed or new node without path.repo
setting joins the cluster. I wonder whether the original report is either of these cases?
@psanz-estc I took a look at this issue again. I think this is more of a problem for the " Elastic Stack monitoring page" rather for Elasticsearch itself. Assuming the "Elastic Stack monitoring page" uses the NodesStats API for its UI, it should reflect the fact that there are node level failure in the API's response instead of silently ignore them. The response contains information about:
An example is as the follows:
{
"_nodes": {
"total": 1,
"successful": 0,
"failed": 1,
"failures": [
{
"type": "failed_node_exception",
"reason": "Failed node [GWR8kxDlSqy2D-SyuKKXPA]",
"node_id": "GWR8kxDlSqy2D-SyuKKXPA",
"caused_by": {
"type": "repository_exception",
"reason": "[my_fs_repository] repository type [fs] failed to create on current node",
"caused_by": {
"type": "repository_exception",
"reason": "[my_fs_repository] failed to create repository",
"caused_by": {
"type": "repository_exception",
"reason": "[my_fs_repository] location [fs-repository] doesn't match any of the locations specified by path.repo because this setting is empty"
}
}
}
}
]
},
"cluster_name": "runTask",
"nodes": {}
}
It contains enough information for the "Elastic Stack monitoring page" to indicate that a node is failing to respond. Alternatively, the monitoring page can specify the metrics that it mostly interests in the NodesStats API call to avoid checking repositories if it is not necessary. Hence I think the Elasticsearch side works as intended. I suggest that you follow this up with the team who owns the monitoring page. I plan to close this issue if you are OK with it.
PS: We can have a separate discussion on whether one metric failure should fail the entire NodeStats response. But that is a very different topic which is about how we report the error instead of whether the error is reported. It also does not really help with the monitoring page by itself. I'd argue it could make it worse because things would appear to be alright with the underlying exception goes unnoticed. At least the current situation makes you notice some nodes are missing.
Closing this issue as detailed in the above message. Thanks for reporting.
Elasticsearch Version
8.11
Installed Plugins
No response
Java Version
bundled
OS Version
Rocky Linux 8.8
Problem Description
ES nodes were not listed in the Nodes tab in Elasticsearch monitoring (using metricbeat)
There wasn't any evident error until we checked
node_stats
API call, which returned:ESRepo repository shouldn't be there, and it seems it was causing the node_stats call to "fail" and inadvertently, the Elastic Stack monitoring page to return an empty list
As soon as we removed this from "Stack Management \ Snapshot and Restore \ Repositories" the nodes under the "Nodes" tab showed up immediately
Steps to Reproduce
Create/configure a FS repo with an empty
path.repo
defined Check node stats API The API call will return afailed_node_exception
due todoesn't match any of the locations specified by path.repo because this setting is empty
Elastic Stack monitoring won't show any of the nodes in the Node tabLogs (if relevant)
No response