Closed Potterli20 closed 2 years ago
We cannot reproduce this. Please fill the whole issue template and also provide information about which kinds of upstreams you're using. Also, how do you determine the amount of CLOSE_WAIT
sockets? Can you show the command and its output? Thanks.
We cannot reproduce this. Please fill the whole issue template and also provide information about which kinds of upstreams you're using. Also, how do you determine the amount of
CLOSE_WAIT
sockets? Can you show the command and its output? Thanks.
我采用的是dns分流文件 程序用systemctl 的 内容 /root/dnsproxy/./dnsproxy -u /root/domain_full.txt -l 0.0.0.0 -p 53 -p 58 -p 57 -b 8.8.8.8 --all-servers --edns --cache --cache-optimisti
但是多人用的时候close_wait请求过多 该代码可以查询 netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'
注:单人用没有感觉,多人用才会
dns分流文件 https://trli.coding.net/p/file/d/dns-hosts/git/lfs/master/dns-adguardhome/whitelist_full.txt
同时我已经修改了sysctl
Thanks for the info, we'll inspect the code and see if we leak any conns.
Thanks for the info, we'll inspect the code and see if we leak any conns.
这个是我自己的/etc/sysctl.conf配置文件 net.ipv4.tcp_retries2 = 8 net.ipv4.tcp_slow_start_after_idle = 0 fs.file-max = 1000000 fs.inotify.max_user_instances = 8192 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_fin_timeout = 10 net.ipv4.tcp_tw_reuse = 1
net.ipv4.ip_local_port_range = 1024 65000 net.ipv4.tcp_max_syn_backlog = 16384 net.ipv4.tcp_max_tw_buckets = 6000 net.ipv4.route.gc_timeout = 15 net.ipv4.tcp_syn_retries = 1 net.ipv4.tcp_synack_retries = 1 net.core.somaxconn = 32768 net.core.netdev_max_backlog = 32768 net.ipv4.tcp_timestamps = 0 net.ipv4.tcp_max_orphans = 32768
net.ipv4.ip_forward = 1 net.core.default_qdisc=fq
net.ipv4.tcp_keepalive_time = 15 net.ipv4.tcp_keepalive_probes = 5
net.core.rmem_max = 3014656
重要的是 net.core.rmem_max = 3014656
我已经调整sysctl.conf文件好了一点点 但是我的服务每到15分钟重启程序
There's still a problem 还是有问题
I push it myself. I compiled it myself. 我自行push,自行编译了。
Did you update the binary before restarting? Because if so, your screenshot shows that the problem is fixed. TIME_WAIT
is just the other side not closing the connection from their side, and the newer logs don't seem to show any CLOSE_WAIT
sockets.
Did you update the binary before restarting? Because if so, your screenshot shows that the problem is fixed.
TIME_WAIT
is just the other side not closing the connection from their side, and the newer logs don't seem to show anyCLOSE_WAIT
sockets.
I actually compiled the above one and ran it for an hour before restarting the program. At the beginning of the program is ok, time began to occupy the port. 其实上面那个我是编译好并运行了一个小时才去重启程序。一开始程序还好了,时间久了开始占用端口。
@Potterli20, @fernvenue, this is unrelated to this issue, but I've noticed you two downvoting each other and, sometimes, other posters as well. I don't know why you two do that, but could you please stop? That confuses newcomers, like in #4503, and just generally doesn't improve the quality of conversations in issues. Thanks.
I actually compiled the above one and ran it for an hour before restarting the program. At the beginning of the program is ok, time began to occupy the port. 其实上面那个我是编译好并运行了一个小时才去重启程序。一开始程序还好了,时间久了开始占用端口。
Could you look through the netstat
output to see what the remote addresses, and especially ports, are? Perhaps this is caused by a particular misbehaving or weirdly behaving upstream. Also, are there any errors in the verbose logs?
@ainar-g I have no idea, but I do upvoted for you, if that confuse you so sorry and I will stop using any emoji in this project.
Edited: I have checked and removed all emojis as much as possible, my apologies for that.
Upvotes and other reactions are fine, but is regarded as a fairly negative thing, and it's better to not use it unless you also provide a comment regarding the reason. Again, thanks for understanding.
Could you look through the netstat output to see what the remote addresses, and especially ports, are? Perhaps this is caused by a particular misbehaving or weirdly behaving upstream. Also, are there any errors in the verbose logs?
This question feels like a long one. My current program ADH restarts every 6 hours and DNSProxy upstream restarts every 15 minutes. I don't have a lot of data to offer, but CLOSE_Wait has a big impact on Linux. My upstream agent can be found there. 这个问题感觉是好久的问题的。我现在程序adh是每6个小时重启一次,dnsproxy上游每15分钟重启一次。我也提供不了很多数据,我只知道close_wait对于linux影响很大。我的上游代理可以在那里找得到https://github.com/trli-dns/file-scripts/blame/a8698ed8998277737232de716482bd68907f9a21/dns.sh#L141。
@Potterli20, @fernvenue, this is unrelated to this issue, but I've noticed you two downvoting each other and, sometimes, other posters as well. I don't know why you two do that, but could you please stop? That confuses newcomers, like in #4503, and just generally doesn't improve the quality of conversations in issues. Thanks.
There is a negative state itself, from this issue I do not want to talk #4316 本身就是有消极状态,从这个问题上我已经不想说话了
I actually compiled the above one and ran it for an hour before restarting the program. At the beginning of the program is ok, time began to occupy the port. 其实上面那个我是编译好并运行了一个小时才去重启程序。一开始程序还好了,时间久了开始占用端口。
Could you look through the
netstat
output to see what the remote addresses, and especially ports, are? Perhaps this is caused by a particular misbehaving or weirdly behaving upstream. Also, are there any errors in the verbose logs?
Oh, right. If DNS writes udp:// protocol, TCP is preferred 哦,对了。如果是写udp://协议的dns,他是优先用tcp
https://github.com/AdguardTeam/dnsproxy/issues/230 https://github.com/AdguardTeam/dnsproxy/issues/165 https://github.com/AdguardTeam/AdGuardHome/issues/4214 https://github.com/AdguardTeam/AdGuardHome/issues/4174 以上是我之前一直提问的问题,一直反复都是这个问题。主要防火墙全开,配置文件和syscet都有调整过。也是有问题。不知道为什么如果是普通用户可能没有发现这个问题,可我一直用着你们的产品,有好多次想换走adh和dnsproxy,但是你们的功能还是很棒的 The above is the question I have been asking before, repeatedly. The main firewalls are on, configuration files and SYSCet have been adjusted. There are problems. I don't know why ordinary users may not find this problem, but I have been using your products and have wanted to change ADH and DNSProxy for many times, but your functions are still excellent
@Potterli20, hello again. What exact setup are you checking? We've only pushed the fix into dnsproxy master branch, so that AGH's behavior has no changes yet. Have you also built the AGH from source with dnsproxy module replaced?
Also, have you tried the dnsproxy as a single resolver? Thanks.
It is entirely possible that different network environments uncover different bugs in our implementations. We'll keep looking for them. Thanks for all the info you're providing so far.
@Potterli20, hello again. What exact setup are you checking? We've only pushed the fix into dnsproxy master branch, so that AGH's behavior has no changes yet. Have you also built the AGH from source with dnsproxy module replaced?\n\nAlso, have you tried the dnsproxy as a single resolver? Thanks.
I have always used DNSProxy as a pure DNS, not for AD blocking. I have been using CLOSE_wait for a few weeks and dNSproxy is used to stream files with simple DNS. I only changed DNSProxy, adH did not change. Adh may be using my rule 150M, resulting in performance leaks and occasional close-wait issues
It is entirely possible that different network environments uncover different bugs in our implementations. We'll keep looking for them. Thanks for all the info you're providing so far.
But I am using the Chinese network and the international network are the same problem. Close_wait occurs whenever there are too many requests
@Potterli20, hello again. What exact setup are you checking? We've only pushed the fix into dnsproxy master branch, so that AGH's behavior has no changes yet. Have you also built the AGH from source with dnsproxy module replaced?
Also, have you tried the dnsproxy as a single resolver? Thanks.
I'm actually checking for close_WAIT. The close_wait problem is a problem for Linux
@Potterli20, to what value the max_goroutines
property set in the AGH's configuration file?
@Potterli20, to what value the max_goroutines property set in the AGH's configuration file?
Adh does not set Max, only cache adh是没有设置max的,只设置缓存
@Potterli20, in AGH's configuration file there is a field called max_goroutines
, please see the wiki page. In dnsproxy the same parameter may be configured via flag option --max-go-routines=<value>
.
Could you please try to set both of them into 0
first, and 1000
then and see if the issue affected somehow? Thanks.
Potterli20, in AGH's configuration file there is a field called max_goroutines, please see the wiki page. In dnsproxy the same parameter may be configured via flag option --max-go-routines=\u003Cvalue>.\n\nCould you please try to set both of them into 0 first, and 1000 then and see if the issue affected somehow? Thanks.
This is affected, users will also feel stuck 这个是受到影响的,用户也会感觉会卡
@Potterli20, in AGH's configuration file there is a field called
max_goroutines
, please see the wiki page. In dnsproxy the same parameter may be configured via flag option--max-go-routines=<value>
.Could you please try to set both of them into
0
first, and1000
then and see if the issue affected somehow? Thanks.
这个是我本地上游的dns,全部都在另一个机器上处理 This is my local UPSTREAM DNS, all processed on another machine
Dnsproxy and ADH have a common problem, the program memory leak, request IP most close_wait leads to crazy connection, one hour connection number has gone to more than 3W, resulting in network slow down
dnsproxy和adh有个通病,程序内存泄露,请求ip大多数close_wait导致程序疯狂连接,一个小时连接数已经去到3w多,导致网络变慢
Now it's 30 minutes plus 30 watts