Open jtlisi opened 3 years ago
In the PR https://github.com/cortexproject/cortex/pull/3414 we've handled a similar case for GetReplicationSetForOperation().
I'm wondering for which service and setup you're experiencing the issue. Assuming it's related to the ingesters ring (read/write samples), I guess you're fixing it for the case "shard by all labels" is disabled, because when that is enabled we use GetReplicationSetForOperation()
instead of Get()
and GetReplicationSetForOperation()
should handle it.
Could you share more details about the setup?
In the PR #3414 we've handled a similar case for
GetReplicationSetForOperation().
I'm wondering for which service and setup you're experiencing the issue. Assuming it's related to the ingesters ring (read/write samples), I guess you're fixing it for the case "shard by all labels" is disabled, because when that is enabled we use
GetReplicationSetForOperation()
instead ofGet()
andGetReplicationSetForOperation()
should handle it.Could you share more details about the setup?
This is for remote-write operations to the Distributor with shard by all labels enabled. I may be a bit confused. I thought and it looks to me that Push
is called for all writes and that functions uses ring.DoBatch
which calls Get
. Do we swap out implementations if ShardByAllLabels
is enabled?
Describe the bug
If all the instances in a zone are in the
LEAVING
state but all of the other zones are fully available, ring checks will return errors.To Reproduce
This can be replicated but updating the following test:
Expected behavior
Cortex should be resilient to unavailable zones as long as
> RF / 2
zones are available