Open srctar opened 5 years ago
Hi @srctar, we detect non-English characters in the issue. This comment is an auto translation from @sentinel-bot to help other users to understand this issue. We encourage you to describe your issue in English which is more friendly to other users.
Chinese:
Under a large flow pressure (concurrent). Current limit** may be inaccurate. The inaccuracy is that in the entry
chain of responsibility, the StatisticSlot
statistic, and the FlowSlot
get the data, the two of them are not preemptive. On my computer (i5 8400, 16G, MAC10.14.5), they have a time difference of about 2ms, and this time difference is enough for subsequent threads to bypass the current limit FlowSlot
plugin.
English: Under concurrent traffic, rate limit may not correct; the way for limit is a way of slot chain:
FlowSlot
for limiting, StatisticSlot
for statistic; FlowSlot
use StatisticSlot
data. but they do not wait for a lock.
it may cause FlowSlot
use an old data. And pass a lot of traffic.
Chinese: This problem reappears only in a thread pool environment and is stable and reproducible. I have been watching for a long time and have not judged what the reason is. Status: It may be that for a period of more than 5s, there is traffic entering, the current limit is not less than 1, and all requests are blocked. English: only under thread pool, the request may be blocked by sentinel, all the request be blocked, even the limit is larger than 0.
These two questions generally appear together. The first question first appears, and the second question occurs after about two or three seconds. The first problem caused more traffic to be dropped, and the second problem caused all traffic to be blocked (the second problem only occurred in the thread pool environment). the 2 case show together; the first one may cause more than limit traffic; the second may cause no traffic passed(the second case may only appear in thread pool );
public static void main(String[] xxx) throws Exception {
XXX x = new XXX();
int j = 99999999;
while (j-- > 0) {
try {
final int av = j;
Executor.execute(() -> x. print a SystemOut(av));
} catch (Exception E) {
E.printStackTrace();
}
// When performing hibernation, the problem no longer reappears
/*TimeUnit.MILLISECONDS.sleep(20L);*/
}
System.out.println("shut down");
}
mac os x 10.14.5, jdk8u225, eclipse
When the current limiter gets the current QPS, it can solve these two problems by synchronizing.
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;
import java.util.concurrent.Executor;
import java.util.concurrent.Executors;
import com.alibaba.csp.sentinel.Entry;
import com.alibaba.csp.sentinel.SphU;
import com.alibaba.csp.sentinel.slots.block.RuleConstant;
import com.alibaba.csp.sentinel.slots.block.flow.FlowRule;
import com.alibaba.csp.sentinel.slots.block.flow.FlowRuleManager;
public class ZZZ {
SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
private static Executor executor = Executors.newFixedThreadPool(50);
private static String pre;
private static int size = 99990000;
public void justPrint(int i) {
try (Entry entry = SphU.entry("XXX.sysGood11")) {
String print = sdf.format(new Date());
if (!print.equalsIgnoreCase(pre)) {
System.out.println();
pre = print;
}
System.out.println(print + "\t" + i);
} catch (Throwable e) {
} finally {
}
}
public static void main(String[] xxx) throws Exception {
ZZZ x = new ZZZ();
int j = size;
initFlowQpsRule();
while (j-- > 0) {
try {
final int p = j;
executor.execute(() -> x.justPrint(p));
/*new Thread(() -> x.justPrint(p)).start()*/
} catch (Exception E) {
E.printStackTrace();
}
/*TimeUnit.MILLISECONDS.sleep(20L);*/
}
System.out.println("shut down");
}
private static void initFlowQpsRule() {
List<FlowRule> rules = new ArrayList<>();
FlowRule rule = new FlowRule("XXX.sysGood11");
// set limit qps to 5
rule.setCount(5);
rule.setGrade(RuleConstant.FLOW_GRADE_QPS);
rule.setLimitApp("default");
rules.add(rule);
FlowRuleManager.loadRules(rules);
}
}
从结果看确实会出现额外流量会被放行的情况,但这种其实和StatisticSlot 统计数据, 与FlowSlot获取数据,他们两并不是抢占式的没有很大的关系。默认的slot调用链是FlowSlot之后再到StatisticSlot,所以StatisticSlot的统计是事后统计, 一个先一个后,两者没有竞争关系。另外一点就是FlowSlot读取的是StatisticSlot的是统计的平均数据,这个操作用了一个向下取整的操作。所以流量是算少了的,但这个影响是非常的少,只向时间窗口的大小有关系,具体可参与下面的代码的(int)(node.passQps());
private int avgUsedTokens(Node node) {
if (node == null) {
return DEFAULT_AVG_USED_TOKENS;
}
return grade == RuleConstant.FLOW_GRADE_THREAD ? node.curThreadNum() : (int)(node.passQps());
}
从结果看确实会出现额外流量会被放行的情况,但这种其实和StatisticSlot 统计数据, 与FlowSlot获取数据,他们两并不是抢占式的没有很大的关系。默认的slot调用链是FlowSlot之后再到StatisticSlot,所以StatisticSlot的统计是事后统计, 一个先一个后,两者没有竞争关系。另外一点就是FlowSlot读取的是StatisticSlot的是统计的平均数据,这个操作用了一个向下取整的操作。所以流量是算少了的,但这个影响是非常的少,只向时间窗口的大小有关系,具体可参与下面的代码的
(int)(node.passQps());
private int avgUsedTokens(Node node) { if (node == null) { return DEFAULT_AVG_USED_TOKENS; } return grade == RuleConstant.FLOW_GRADE_THREAD ? node.curThreadNum() : (int)(node.passQps()); }
感谢回复。
我们表达了类似的事, 但是着眼点可能不一样。
二者是先后关系, 也确实没有竞争关系。 先者FlowSlot
引用后者StatisticSlot
统计的数据, 这样一来较高的并发下, 先者引用的数据将可能不准备。 因此我使用了抢占式这个词。
针对时间窗口, 不确定是不是我看漏了, 秒级的实现是一前一后两个500ms的滑动窗口, 是固定值。 因此 rollingCounterInSecond.pass() / rollingCounterInSecond.getWindowIntervalInSec();
的商是固定值已经通过的QPS数; 佐证上面的那个说法, 高并发下可能会有漏掉的流量。
第二个问题, 线程池环境下, 高并发流量。 会出现所有请求全部被截断的情况。 我想了很多原因, 也百思未得其解(我发的那段ZZZ
代码就可以复现)。如果大神有空, 还请指教。
并发流量下,(可能)1.限流不准/2.拦截所有请求
Under concurrent traffic,rate limit may not correct, and block all request
并发流量, 限流可能不准(放过更多的流量,稳定复现)
中文: 在较大的流量压力(并发)下。限流可能不准确。 不准确的点在于, 获取
entry
责任链中,StatisticSlot
统计数据, 与FlowSlot
获取数据,他们两并不是抢占式的。在我的电脑(i5 8400, 16G, MAC10.14.5)上,他们两大约有2ms左右的时间差,而这个时间差足够让后续的线程绕过限流FlowSlot
插件了。English: Under concurrent traffic, rate limit may not correct; the way for limit is a way of slot chain:
FlowSlot
for limiting,StatisticSlot
for statistic;FlowSlot
useStatisticSlot
data. but they do not wait for a lock. it may causeFlowSlot
use an old data. And pass a lot of traffic.并发流量下, 部分的请求可能都会被block掉(通过的流量低于设定阈值, 仅出现在线程池调度的情况下)。
中文: 仅在线程池环境下, 该问题复现, 且稳定复现。 我看了很久,没有判断出原因是什么。 现状: 可能在持续长达5s以上的时间, 有流量进入,限流大小不小于1,所有请求被阻断。 English: only under thread pool, the request may be blocked by sentinel, all the request be blocked, even the limit is larger than 0.
复现方式(the way to reproduce it)
Tell us your environment
mac os x 10.14.5, jdk8u225, eclipse
Anything else we need to know?
在 限流器 获取当前QPS的时候, 同步一下, 可解决这两个问题。
复现问题的全部代码: