elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.52k stars 24.9k forks source link

Index creation causes cluster health to turn red momentarily #9106

Closed ppf2 closed 8 years ago

ppf2 commented 9 years ago

It is not uncommon for admins in the field to set up alerts against the cluster health (red/yellow/green). Currently, index creation can cause the cluster health to go red momentarily until its primary shards are allocated (expected). It would be a nice enhancement to have a way to create an index without causing the cluster health to go red (even for a short subsecond durations).

jpountz commented 9 years ago

Could this be related to https://github.com/elasticsearch/elasticsearch/issues/9126? If the index creation API starts waiting for yellow by default then maybe the health status could only take into account the newly created index once the index creation request terminates (including timeouts)?

barakcoh commented 9 years ago

+1

Marvel is causing us a sub-second red status every day at midnight and it's quite annoying to constantly see it in the Shard Allocation section. We also have the above alert in place. If it happens to query the cluster at that exact time people will get a Twillio call in the middle of the night which is less than ideal.

jhansen-tt commented 9 years ago

+1 What can I do to help fix this?

ppf2 commented 9 years ago

+1 Use case: Indexing a ton of data via Logstash hourly indices and seeing red every hour ..

mikemccand commented 9 years ago

+1, the cluster should never go red unless data loss has occurred ... this is a nasty bug in our cluster health.

It's like the smoke alarms that go off in my house when it's too dusty or we are cooking something "unusual".

9126 seems very much related.

bleskes commented 9 years ago

@mikemccand it is related though slightly different. Even if we hold back the create index response until the index is green/yellow etc an independent monitoring of cluster health will report it's status.

I'm +100 on solving this but I couldn't come up - to date - with a proper solution. When we create an index we add unassigned primaries + replicas to the routing table. We try to assign the primaries immediately (which may fail because of throttling) and publish the cluster state to the nodes for the primaries to initialize. Here lies the problem - a cluster state with initializing primaries is technically red. Only once the shards are started do we move to yellow. One we could say that a cluster health should ignore initialzing/unassigned shards which are guaranteed to not contain data but then what happens when those primaries can not be assigned (because of allocation filtering or whatever)? we should still communicate that somehow as the situation is wrong. I'd love to hear an elegant suggestion here...

nik9000 commented 9 years ago

I had this trouble to - every time I did an online mapping change I had to rebuild the index and stream one index to another - and I had 1600 indexes to do. Icinga generally thought Elasticsearch was flapping at that time because it was.

Maybe ignore indexes less than 60 seconds old in overall cluster state. The index itself should be red, but maybe not the whole cluster.

Any solution to this is going to break a whole lot of tests somewhere but is probably worth it.

felipegs commented 9 years ago

+1

bashok001 commented 8 years ago

+1 Very much needed. Our alarms are going off every couple of days. I worry that continuing the practice of waiting it out will one day cost us dearly one day when there is a real problem.

jeffkirk1 commented 8 years ago

+1 I'm experiencing this issue daily as well, coincidental with Marvel index reloads. Elasticsearch 2.2.0. Temporarily disabled Marvel refreshes to compensate but obviously that's not a great long term solution.

majormoses commented 8 years ago

@nik9000

Maybe ignore indexes less than 60 seconds old in overall cluster state. The index itself should be red, but maybe not the whole cluster.

This makes sense to me

clintongormley commented 8 years ago

Fixed by https://github.com/elastic/elasticsearch/pull/18737