elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.98k stars 24.75k forks source link

Cluster turning **RED** due to Master Nodes in 7.5.2 #56260

Closed alogishetty closed 4 years ago

alogishetty commented 4 years ago

Describe the feature:

Elasticsearch version (bin/elasticsearch --version): 7.5.2

Plugins installed: []

JVM version (java -version):

OS version (uname -a if on a Unix-like system): CentOs

Description of the problem including expected versus actual behavior: Elected master node doesn't communicate with other master nodes within expected time. This is causing the re-election process and cluster turning RED

This issue keeps happening randomly all during the day, we are not sure what is actually causing "Master Node Discover Exception"

Below is the current setup

no. of nodes type allocated heap vm ram cores
3 master nodes 8gb 16gb 4
12 data nodes 16gb 32gb 10
1 ingest node 8gb 18gb 4
2 coordination nodes 16gb 32gb 10

current storage: 10tb number of shards: 5000

Is this an issue with 7.5.2 or something to do with our setup? Please let us know, this is highly critical for us!

Steps to reproduce:

Please include a minimal but complete recreation of the problem, including (e.g.) index creation, mappings, settings, query etc. The easier you make for us to reproduce it, the more likely that somebody will take the time to look at it.

1. 2. 3.

Provide logs (if relevant): image

tvernum commented 4 years ago

Thanks very much for your interest in Elasticsearch.

This appears to be a user question, and we'd like to direct these kinds of things to the Elasticsearch forum. If you can stop by there, we'd appreciate it. This allows us to use GitHub for verified bug reports, feature requests, and pull requests. The discussion forums are beter suited to these sorts of troubleshooting and problem diagnosis conversations.

There's an active community in the forums that should be able to help get an answer to your question. As such, I hope you don't mind that I close this.

Thank you.

alogishetty commented 4 years ago

But there is an actual issue here, elected leader is not responding to call within expected time and quorum is forcing leader election.

Received response for a request that has timed out, sent [44253ms] ago, timed out [34245ms] ago, action [internal:coordination/fault_detection/leader_check]