Closed ppf2 closed 8 years ago
Could this be related to https://github.com/elasticsearch/elasticsearch/issues/9126? If the index creation API starts waiting for yellow by default then maybe the health status could only take into account the newly created index once the index creation request terminates (including timeouts)?
+1
Marvel is causing us a sub-second red
status every day at midnight and it's quite annoying to constantly see it in the Shard Allocation
section. We also have the above alert in place. If it happens to query the cluster at that exact time people will get a Twillio call in the middle of the night which is less than ideal.
+1 What can I do to help fix this?
+1 Use case: Indexing a ton of data via Logstash hourly indices and seeing red every hour ..
+1, the cluster should never go red unless data loss has occurred ... this is a nasty bug in our cluster health.
It's like the smoke alarms that go off in my house when it's too dusty or we are cooking something "unusual".
@mikemccand it is related though slightly different. Even if we hold back the create index response until the index is green/yellow etc an independent monitoring of cluster health will report it's status.
I'm +100 on solving this but I couldn't come up - to date - with a proper solution. When we create an index we add unassigned primaries + replicas to the routing table. We try to assign the primaries immediately (which may fail because of throttling) and publish the cluster state to the nodes for the primaries to initialize. Here lies the problem - a cluster state with initializing primaries is technically red. Only once the shards are started do we move to yellow. One we could say that a cluster health should ignore initialzing/unassigned shards which are guaranteed to not contain data but then what happens when those primaries can not be assigned (because of allocation filtering or whatever)? we should still communicate that somehow as the situation is wrong. I'd love to hear an elegant suggestion here...
I had this trouble to - every time I did an online mapping change I had to rebuild the index and stream one index to another - and I had 1600 indexes to do. Icinga generally thought Elasticsearch was flapping at that time because it was.
Maybe ignore indexes less than 60 seconds old in overall cluster state. The index itself should be red, but maybe not the whole cluster.
Any solution to this is going to break a whole lot of tests somewhere but is probably worth it.
+1
+1 Very much needed. Our alarms are going off every couple of days. I worry that continuing the practice of waiting it out will one day cost us dearly one day when there is a real problem.
+1 I'm experiencing this issue daily as well, coincidental with Marvel index reloads. Elasticsearch 2.2.0. Temporarily disabled Marvel refreshes to compensate but obviously that's not a great long term solution.
@nik9000
Maybe ignore indexes less than 60 seconds old in overall cluster state. The index itself should be red, but maybe not the whole cluster.
This makes sense to me
It is not uncommon for admins in the field to set up alerts against the cluster health (red/yellow/green). Currently, index creation can cause the cluster health to go red momentarily until its primary shards are allocated (expected). It would be a nice enhancement to have a way to create an index without causing the cluster health to go red (even for a short subsecond durations).