cortexproject / cortex

A horizontally scalable, highly available, multi-tenant, long term Prometheus.
https://cortexmetrics.io/
Apache License 2.0
5.45k stars 792 forks source link

Improvement on ExtendWrites #5977

Open alanprot opened 4 months ago

alanprot commented 4 months ago

Additional context

Currently, if we set -distributor.extend-writes to true, distributors will send the series to one extra ingester if some ingester is NOT on ACTIVE. See the doc:

# Try writing to an additional ingester in the presence of an ingester not in
# the ACTIVE state. It is useful to disable this along with
# -ingester.unregister-on-shutdown=false in order to not spread samples to extra
# ingesters during rolling restarts with consistent naming.
# CLI flag: -distributor.extend-writes
[extend_writes: <boolean> | default = true]

This behavior is somehow odd we will basically double the active series during deployments as all ingesters will eventually not be active (during restarted we flip to LEAVING,PENDINGandJOINING`)

I see this feature as a way to prevent errors in case of impaired ingesters, and as such, i think makes way more sense to only extend the writes if the ingester is marked as UNHEALTHY on the ring. In other words, In cases such as deployments, we should not extend the writes.

@yeya24 @friedrich-at-adobe Thoughts?

friedrichg commented 4 months ago

double the active series

this is only valid if the deployment takes less than 2 hours, because of the shipping every 2 hours.

But I do I agree with your suggested change, this would reduce memory usage during deployments, for sure.

friedrichg commented 4 months ago

fyi @CharlieTLe

alanprot commented 3 months ago

this is only valid if the deployment takes less than 2 hours, because of the shipping every 2 hours.

Yeah.. we can achieve this easily using https://github.com/aws/zone-aware-controllers-for-k8s

The question is also, is this a breaking change? I wanna avoid create a very similar feature and put before yet another feature flag.

CharlieTLe commented 3 months ago

Seems like a good idea. But if people have the heartbeat disabled for ingesters, then it's possible they will never become unhealthy. I think the better trick here is to configure the ingester to stay in the ring when it restarts so that the hash ring is consistent.

alanprot commented 3 months ago

I think the better trick here is to configure the ingester to stay in the ring when it restarts so that the hash ring is consistent.

They already stay in the ring though if you configure unregister_on_shutdown=false (which seems to be a good practice to avoid reshards)

The thing is when shutting down the ingester status flips to LEAVING or Joining and it triggers a extend right (which then also cause reshards).

  # Unregister from the ring upon clean shutdown. It can be useful to disable
  # for rolling restarts with consistent naming in conjunction with
  # -distributor.extend-writes=false.
  # CLI flag: -ingester.unregister-on-shutdown
  [unregister_on_shutdown: <boolean> | default = true]

TBH even for me its very hard to get my head around all those configs and how should i configure them in order to get what i wanna. The documentation basically: if you configure unregister_on_shutdown=false (which is a good practice) do not enable extent-writes OR you will have reshard during deployment anyway.

My whole goal is, not reshard during deployments BUT i can accept some reshard if i have 1 bad ingester because of hardware impairment. This option is just not possible on the current state of the world.

This feature as is is so weird that if i have one ingester that just die abruptly (and so, not flip its status on the ring), we will never extent the writes to avoid availability drops -> but we would do it on a normal deployment 🤯

If we add yet another config on top of the 2 already confusing that already exists i think we would just make everything even more confusing. I would vote just to "fix" this behavior.. that i think for now is not very useful.

friedrichg commented 3 months ago

@alanprot I like your suggested change to make extended writes usable for large deployments and fast rollouts. I don't think is a breaking change. It's an improvement.

One question I have is about reads, it used to be possible to read from leaving ingesters, Is that still the case? That would mean this change will also reduce the amount of reading on ingesters during rollout too for extended writes.


Also less flags is better. We should add flags if we are not sure what the behavior should be like. I don't think this is the case here. Both flags were added when chunks was around and ingesters could have inconsistent naming because a k8s deployment was used for ingesters. That is not longer the case. This began on https://github.com/cortexproject/cortex/pull/90 https://github.com/cortexproject/cortex/pull/2443 https://github.com/cortexproject/cortex/pull/3305

So if we are feeling bold, we probably should remove both flags to simplify things for everybody. That would mean behavior should be the same as: