Graylog2 / graylog2-server

Free and open log management
https://www.graylog.org
Other
7.33k stars 1.06k forks source link

Confusing message when server start fails because of no ES connection #1023

Closed lennartkoopmann closed 9 years ago

lennartkoopmann commented 9 years ago

When the server cannot connect to Elasticsearch the help message used to say that no connection to ES could be established and guide the user a bit about what to do (IIRC).

Now it says that all shards are red and the cluster might be broken:

2015-03-03 18:35:42,872 INFO : org.elasticsearch.transport - [graylog2-server] bound_address {inet[/0:0:0:0:0:0:0:0:9350]}, publish_address {inet[/172.20.13.186:9350]}
2015-03-03 18:35:42,890 INFO : org.elasticsearch.discovery - [graylog2-server] graylog2-dev/3AeaV3_BSxmcPXjFHV8OJw
2015-03-03 18:35:42,910 WARN : org.graylog2.indexer.esplugin.ClusterStateMonitor - No Elasticsearch data nodes in cluster, cluster is completely offline.
2015-03-03 18:35:44,219 INFO : org.reflections.Reflections - Reflections took 1576 ms to scan 1 urls, producing 2 keys and 2 values 
2015-03-03 18:35:45,910 WARN : org.elasticsearch.discovery - [graylog2-server] waited for 3s and no initial state was set by the discovery
2015-03-03 18:35:45,911 INFO : org.elasticsearch.node - [graylog2-server] started
2015-03-03 18:35:46,185 INFO : org.elasticsearch.cluster.service - [graylog2-server] detected_master [James Proudstar][y7S9Q4X-R6mWrgmdEFnu-Q][sundaysister.local][inet[/172.20.13.186:9300]], added {[James Proudstar][y7S9Q4X-R6mWrgmdEFnu-Q][sundaysister.local][inet[/172.20.13.186:9300]],}, reason: zen-disco-receive(from master [[James Proudstar][y7S9Q4X-R6mWrgmdEFnu-Q][sundaysister.local][inet[/172.20.13.186:9300]]])
2015-03-03 18:35:46,236 ERROR: org.graylog2.UI - 

################################################################################

ERROR: The Elasticsearch cluster state is RED which means shards are unassigned. This usually indicates a crashed and corrupt cluster and needs to be investigated. Graylog will shut down.

Need help?

* Official documentation: https://www.graylog.org/documentation/intro/
* Community support: https://www.graylog.org/community-support/
* Commercial support: https://www.graylog.com/support/

But we also got some specific help pages that might help you in this case:

* https://www.graylog.org/documentation/setup/elasticsearch/

Terminating. :(

################################################################################

This is confusing because now users might think that the connection was made but the cluster is broken in some way.

lennartkoopmann commented 9 years ago

This is the message that should be shown: https://github.com/Graylog2/graylog2-server/blob/master/graylog2-server/src/main/java/org/graylog2/initializers/IndexerSetupService.java#L172-L174

kroepke commented 9 years ago

That message in your log is only shown if the cluster state is in fact red. Please double check your cluster status. In case the cluster join didn't work the second part of the code is executed, because the state check would throw an exception. On Mar 4, 2015 1:40 AM, "Lennart Koopmann" notifications@github.com wrote:

This is the message that should be shown: https://github.com/Graylog2/graylog2-server/blob/master/graylog2-server/src/main/java/org/graylog2/initializers/IndexerSetupService.java#L172-L174

— Reply to this email directly or view it on GitHub https://github.com/Graylog2/graylog2-server/issues/1023#issuecomment-77074273 .

joschi commented 9 years ago
2015-03-03 18:35:46,185 INFO : org.elasticsearch.cluster.service - [graylog2-server] detected_master [James Proudstar][y7S9Q4X-R6mWrgmdEFnu-Q][sundaysister.local][inet[/172.20.13.186:9300]], added {[James Proudstar][y7S9Q4X-R6mWrgmdEFnu-Q][sundaysister.local][inet[/172.20.13.186:9300]],}, reason: zen-disco-receive(from master [[James Proudstar][y7S9Q4X-R6mWrgmdEFnu-Q][sundaysister.local][inet[/172.20.13.186:9300]]])

Graylog could successfully connect to the Elasticsearch instance (named "James Proudstar" in this case) and thus could detect the cluster status correctly.

The actual misleading message is the one from our cluster state monitor plugin, which seems to flap status on startup:

2015-03-03 18:35:42,910 WARN : org.graylog2.indexer.esplugin.ClusterStateMonitor - No Elasticsearch data nodes in cluster, cluster is completely offline.
kroepke commented 9 years ago

Maybe that line should be suppressed the first time it happens, because it will always say that during restart, for timing reasons. One simple boolean should be enough to make it go away.