alibaba / Sentinel

A powerful flow control component enabling reliability, resilience and monitoring for microservices. (面向云原生微服务的高可用流控防护组件)
https://sentinelguard.io/
Apache License 2.0
22.17k stars 7.95k forks source link

[DISCUSS] Make flow control and circuit break separately in dubbo #281

Open ro9er opened 5 years ago

ro9er commented 5 years ago

Issue Description

In Sentinel, the resource is the base unit for both flow control and circuit break. It's fine in most cases. But let's image one situation in dubbo, as for one service, we have one consumer and 5 providers. One provider has IP strategy error and can not connect to down side service, which makes it always throw exceptions. Our degrade rule set the exception rate is 0.4. In such scene, the degrade can not be triggered because the exception rate keeps 0.2 and the client invoke the dubbo service will get one exception per 5 calls (in round robin load balance). we can make the rate lower to 0.1 to trigger the degrade and circuit break, but this is kind of waste because we have 4 health service out of 5.

Describe what happened (or what feature you want)

I want to separate the statistic for flow control and circuit break. It is good idea to make statistic based on resource for flow control, on the other hand we should make circuit break decision on the statistic of one provider. The things we should do is as follows:

  1. separate the statistic for flow control and circuit break
  2. if one provider trigger break, this should respond to the provider list in the client dubbo cache and make the exception provider 'unreachable'
  3. when the break window time elapsed, the provider status in client dubbo cache should be converted to 'half open' or 'ready'

Describe what you expected to happen

How to reproduce it (as minimally and precisely as possible)

Tell us your environment

Anything else we need to know?

cdfive commented 5 years ago

0.2是指集群限流的场景吗? 轮询调度5个只产生1个异常,如果是按单节点,因为节点因IP策略导致连接有问题,那个节点异常应是100%,这个理解对不?

ro9er commented 5 years ago

The description above means that one consumer invokes a dubbo service backed by 5 providers. If one provider always throw exception, the error rate keeps 0.2 in client's view. As for the exception provider, the error rate is always 100%. It is not for cluster mode.

cdfive commented 5 years ago

The description above means that one consumer invokes a dubbo service backed by 5 providers. If one provider always throw exception, the error rate keeps 0.2 in client's view. As for the exception provider, the error rate is always 100%. It is not for cluster mode.

明白了,你是说客户端视角异常率0.2,抱歉开始理解错了。 这个需求可否转为在provider端呢,比如配置它的熔断策略是异常比率超过0.4; 这样1就不用区分节点统计了; 2、3看能不能结合dubbo来做,降级后该节点服务下线,客户端就不调用到该节点了,等过了时间窗口恢复。

ro9er commented 5 years ago

It is a good idea to transfer the logic to provider's side.

jasonjoo2010 commented 5 years ago

Maybe we could use the weight param in services' registry information. And i suggest the unit can be scale to Service(neither method nor whole provider), because:

  1. One call(Service.method) failed frequently really not always means the whole provider becomes unavailable.
  2. Weight of method can not be set separately, only service.

eg, The default weight of service may be always equal to 100 and it will turn to be 1 when triggering break, thus, it will lead to only a few invocations failing and will not get no provider of service at all to invoke(When the service of all providers would fail in some scenes)

joooohnli commented 4 years ago

@cdfive @ro9er 如果是consumer provider之间网络不通,只有consumer才会有调用异常,provider是感知不到的。即这个需要无法转移到provider端。