alibaba / Sentinel

A powerful flow control component enabling reliability, resilience and monitoring for microservices. (面向云原生微服务的高可用流控防护组件)
https://sentinelguard.io/
Apache License 2.0
22.41k stars 8.03k forks source link

after gateway flow rule reload ,result of statistics is incorrect #873

Open cui1100 opened 5 years ago

cui1100 commented 5 years ago

Issue description

sentinel-gateway-adapter , configed the FlowRules' resource with routeId. then i added two or more rules in a same routeId, these time window were setIntervalSec(60).setCount(2) , setIntervalSec(3600).setCount(5) and so on

The first two times, the demo executed successfully and the third time threw the blockException,

then i GatewayRuleManager.loadRules(...), //per 30s scheduled

at the time I request the demo and suppose throw exception too, but it executed successfully again.

I repeated the action several times, got the number of setIntervalSec(3600).setCount(5), contineu repeat the operation above , even thougt i request the demo more than 5 times per hour, it didn't throw exception sometimes after loadRules(...);

//@Scheduled(fixedDelay = 30*1000)
Set<GatewayFlowRule> gatewayFlowRules = new HashSet<GatewayFlowRule>();
        gatewayFlowRules.add(new GatewayFlowRule(config.getRouteId())
                .setCount(1)
                .setIntervalSec(1)
                .setParamItem(new GatewayParamFlowItem()
                        .setParseStrategy(SentinelGatewayConstants.PARAM_PARSE_STRATEGY_HEADER)
                        .setFieldName(fopFluidControlPo.getFluidId()+"-SEC") )
        );
gatewayFlowRules.add(new GatewayFlowRule(config.getRouteId())
                .setCount(2)
                .setIntervalSec(60)
                .setParamItem(new GatewayParamFlowItem()
                        .setParseStrategy(SentinelGatewayConstants.PARAM_PARSE_STRATEGY_HEADER)
                        .setFieldName(fopFluidControlPo.getFluidId()+"-MIN") )
        );
gatewayFlowRules.add(new GatewayFlowRule(config.getRouteId())
                .setCount(3)
                .setIntervalSec(3600)
                .setParamItem(new GatewayParamFlowItem()
                        .setParseStrategy(SentinelGatewayConstants.PARAM_PARSE_STRATEGY_HEADER)
                        .setFieldName(fopFluidControlPo.getFluidId()+"-HOUR") )
        );
gatewayFlowRules.add(new GatewayFlowRule(config.getRouteId())
                .setCount(4)
                .setIntervalSec(43200)
                .setParamItem(new GatewayParamFlowItem()
                        .setParseStrategy(SentinelGatewayConstants.PARAM_PARSE_STRATEGY_HEADER)
                        .setFieldName(fopFluidControlPo.getFluidId()+"-DAY") )
        );
DegradeRuleManager.loadRules(gatewayFlowRules);

Environment

SpringCloudGateway Greenwich.SR1
SpringBoot 2.1.3.RELEASE
sentinel-spring-cloud-gateway-adapter 1.6.2

Chinese description:

英文不好,所以觉得还是用中文再说一下,就是如上所述根据不同的时间窗口,配置了不同的流控次数,但是测试的时候,如果不重新加载规则,基本上一切正常(除了超过流控次数时抛出的Exception经常比较混乱,比如同时超过了每分钟和小时的次数,抛出异常所带的rule是随机的,不重要),但是如果在不断请求的过程中,重新加载规则,那个按分钟流控的就会在规则加载完成之后失效,即请求后重新加载规则,此时距离上次请求未到1分钟,依然可以通过流控;更重要的是,如此多试几次之后,已经到达了每小时的流控次数,但是只要重新加载规则,竟然经常还可以继续请求未抛出异常(有时也会抛错),然后每次加载都是上述情况,只要设置定时加载规则,每次加载完后请求,结果都是不一定的。OVER

sentinel-bot commented 5 years ago

Hi @cui1100, we detect non-English characters in the issue. This comment is an auto translation from @sentinel-bot to help other users to understand this issue. We encourage you to describe your issue in English which is more friendly to other users.

after gateway flow rule reload ,result of statistics is incorrect

SpringCloudGateway Greenwich.SR1
SpringBoot 2.1.3.RELEASE
sentinel-spring-cloud-gateway-adapter 1.6.2

sentinel-gateway-adapter , configed the FlowRules' resource with routeId. then i added two or more rules in a same routeId, these time window were setIntervalSec(60).setCount(2) , setIntervalSec(3600).setCount(5) and so on

The first two times,the demo executed successfully and the third time threw the blockException,

then i GatewayRuleManager.loadRules(...), //per 30s scheduled

at the time I request the demo and suppose throw exception too, but it executed successfully again.

I repeated the action several times, got the number of setIntervalSec(3600).setCount(5), contineu repeat the operation above , even thougt i request the demo more than 5 times per hour, it didn't throw exception sometimes after loadRules(...);

//@Scheduled(fixedDelay = 30*1000) Set<GatewayFlowRule> gatewayFlowRules = new HashSet<GatewayFlowRule>(); gatewayFlowRules.add(new GatewayFlowRule(config.getRouteId()) .setCount(1) .setIntervalSec(1) .setParamItem(new GatewayParamFlowItem() .setParseStrategy(SentinelGatewayConstants.PARAM_PARSE_STRATEGY_HEADER) .setFieldName(fopFluidControlPo.getFluidId()+"-SEC") ) ); gatewayFlowRules.add(new GatewayFlowRule(config.getRouteId()) .setCount(2) .setIntervalSec(60) .setParamItem(new GatewayParamFlowItem() .setParseStrategy(SentinelGatewayConstants.PARAM_PARSE_STRATEGY_HEADER) .setFieldName(fopFluidControlPo.getFluidId()+"-MIN") ) ); gatewayFlowRules.add(new GatewayFlowRule(config.getRouteId()) .setCount(3) .setIntervalSec(3600) .setParamItem(new GatewayParamFlowItem() .setParseStrategy(SentinelGatewayConstants.PARAM_PARSE_STRATEGY_HEADER) .setFieldName(fopFluidControlPo.getFluidId()+"-HOUR") ) ); gatewayFlowRules.add(new GatewayFlowRule(config.getRouteId()) .setCount(4) .setIntervalSec(43200) .setParamItem(new GatewayParamFlowItem() .setParseStrategy(SentinelGatewayConstants.PARAM_PARSE_STRATEGY_HEADER) .setFieldName(fopFluidControlPo.getFluidId()+"-DAY") ) ); DegradeRuleManager.loadRules(gatewayFlowRules); Chinese description: English is not good, so I think it is still used in Chinese, as described above, according to different time windows, configured different flow control times, but when testing, if you do not reload the rules, basically everything is normal (except for flow control) The Exception thrown at the time is often confusing. For example, the number of times per minute and hour is exceeded. The rule with which the exception is thrown is random, not important, but if the rule is reloaded during the continuous request, that The flow control by minute will expire after the rule is loaded, that is, the rule will be reloaded after the request. At this time, the flow control can still be passed after the last request is less than 1 minute. More importantly, after so many trials, The number of flow control has been reached every hour, but as long as the rules are reloaded, it is often possible to continue to request that no exceptions are thrown (sometimes throwing errors), and then each time loading is the above, just set the timing loading rules, each After the load is completed, the result is not necessarily the same. OVER

cdfive commented 5 years ago

你调用GatewayRuleManager.loadRules(...)修改规则时,修改了里面的count值吗? 还是说只是调用loadRules参数没有变呢?

sczyh30 commented 5 years ago

对于网关限流,同一个 route ID 不同参数规则,都是分开统计的。每个参数规则最后都会转化成热点规则,内部会给每个规则分配参数索引(paramIdx),并汇聚所有的参数传入 Sentinel API 中。这里推测可能是 每次转换的时候 paramIdx 不一样导致对应的 metric 对象对不上(可能由于 Set 无序导致),我明天再 review 一下逻辑看一下。

cui1100 commented 5 years ago

你调用GatewayRuleManager.loadRules(...)修改规则时,修改了里面的count值吗? 还是说只是调用loadRules参数没有变呢?

只是调用了一下loadRules,并没有修改里面的任何东西。

cdfive commented 5 years ago

如果loadRules的参数没有变,因为旧的规则等于新的规则,Sentinel的动态属性更新应该不会处理的。

我本地这样测试的: case1: 1个RouteId,count=2,访问第3次,出现限流,不修改count调loadRules,再访问,仍然是限流 case2: 1个RouteId,count=2,访问第3次,出现限流,修改count=1或3调loadRules,再访问,重新开始统计 case3: 2个相同RouteId,一个count=2,一个count=50,访问第3次,出现限流,不做修改调loadRules,再访问,仍然是限流

这3个case都符合预期。

sczyh30 commented 5 years ago

原因可能是这样的:Sentinel 1.6.0 之前每个资源的热点参数统计是按照 paramIdx 来区分的(Map<Integer, CacheMap> in ParameterMetric);1.6.0 开始支持任意统计时间窗口,因此为了简单起见,key 变成了 rule(Map<ParamFlowRule, CacheMap>),这样不同的 rule 会对应不同的统计数据,在 rule 有变动时(即使只改阈值)原有的数据就不生效了。

This could be a problem when the durationInSec is large (e.g. 1h). Perhaps we could use the (idx, duration) pair as the key to resolve this issue?

linlinisme commented 5 years ago

原因可能是这样的:Sentinel 1.6.0 之前每个资源的热点参数统计是按照 paramIdx 来区分的(Map<Integer, CacheMap> in ParameterMetric);1.6.0 开始支持任意统计时间窗口,因此为了简单起见,key 变成了 rule(Map<ParamFlowRule, CacheMap>),这样不同的 rule 会对应不同的统计数据,在 rule 有变动时(即使只改阈值)原有的数据就不生效了。

This could be a problem when the durationInSec is large (e.g. 1h). Perhaps we could use the (idx, duration) pair as the key to resolve this issue?

确实rule变动时,原有的数据就失效了,因为key变了,Map就找不到对应的value。但导致rule变动有两个原因:一:set的无序性,每次加载都把rule重新添加进set的情况下idx会变。二:重加载之后rule属性的变化。所以如果要以(idx, duration)为key,必须先解决idx变化的问题。

sczyh30 commented 4 years ago

Any ideas on this issue? @wavesZh

wavesZh commented 4 years ago

@sczyh30 If only the change of paramIdx makes it impossible to find the corresponding value in the map, and paramIdx does not really affect the flow control rules, why not consider excluding paramIdx from the hashCode and equals methods?