daeuniverse / daed

daed, a modern dashboard with dae.
https://daeuniverse.github.io/daed/
MIT License
557 stars 58 forks source link

[Bug Report] 无限启动新线程,消耗巨量内存 #393

Open Basstorm opened 11 months ago

Basstorm commented 11 months ago

Checks

Current Behavior

启用后会慢慢无限开启新线程,消耗巨量内存,这是启动1天后的进程status

root@R66S:~# cat /proc/10167/status
Name:   dae-wing
Umask:  0022
State:  S (sleeping)
Tgid:   10167
Ngid:   0
Pid:    10167
PPid:   1
TracerPid:      0
Uid:    0       0       0       0
Gid:    0       0       0       0
FDSize: 2048
Groups:
NStgid: 10167
NSpid:  10167
NSpgid: 1
NSsid:  1
VmPeak:  1649004 kB
VmSize:  1649004 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:    169484 kB
VmRSS:    136764 kB
RssAnon:          118888 kB
RssFile:           17876 kB
RssShmem:              0 kB
VmData:   436344 kB
VmStk:       132 kB
VmExe:     25144 kB
VmLib:       720 kB
VmPTE:       904 kB
VmSwap:        0 kB
HugetlbPages:          0 kB
CoreDumping:    0
THP_enabled:    1
Threads:        1774
SigQ:   0/3853
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: fffffffc7fc1feff
CapInh: 0000000000000000
CapPrm: 000001ffffffffff
CapEff: 000001ffffffffff
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000
NoNewPrivs:     0
Seccomp:        0
Seccomp_filters:        0
Speculation_Store_Bypass:       not vulnerable
SpeculationIndirectBranch:      unknown

可以看到Threads已经有1774个了,PID占了非常多 image image

关联issue:https://github.com/sbwml/luci-app-daed-next/issues/1

Expected Behavior

No response

Steps to Reproduce

No response

Environment

- **Others**:

配置文件:

global {

Software options.
# tproxy port to listen on. It is NOT a HTTP/SOCKS port, and is just used by eBPF program.
# In normal case, you do not need to use it.
tproxy_port: 12345

# Set it true to protect tproxy port from unsolicited traffic. Set it false to allow users to use self-managed
# iptables tproxy rules.
tproxy_port_protect: true

# If not zero, traffic sent from dae will be set SO_MARK. It is useful to avoid traffic loop with iptables tproxy
# rules.
so_mark_from_dae: 0

# Log level: error, warn, info, debug, trace.
log_level: warning

# Disable waiting for network before pulling subscriptions.
disable_waiting_network: true

##### Interface and kernel options.

# The LAN interface to bind. Use it if you want to proxy LAN.
# Multiple interfaces split by ",".
lan_interface: eth0

# The WAN interface to bind. Use it if you want to proxy localhost.
# Multiple interfaces split by ",". Use "auto" to auto detect.
wan_interface: eth1

# Automatically configure Linux kernel parameters like ip_forward and send_redirects. Check out
# https://github.com/daeuniverse/dae/blob/main/docs/en/user-guide/kernel-parameters.md to see what will dae do.
auto_config_kernel_parameter: true

##### Node connectivity check.

# Host of URL should have both IPv4 and IPv6 if you have double stack in local.
# First is URL, others are IP addresses if given.
# Considering traffic consumption, it is recommended to choose a site with anycast IP and less response.
#tcp_check_url: 'http://cp.cloudflare.com'
tcp_check_url: 'http://cp.cloudflare.com,1.1.1.1'

# The HTTP request method to `tcp_check_url`. Use 'HEAD' by default because some server implementations bypass
# accounting for this kind of traffic.
tcp_check_http_method: HEAD

# This DNS will be used to check UDP connectivity of nodes. And if dns_upstream below contains tcp, it also be used to check
# TCP DNS connectivity of nodes.
# First is URL, others are IP addresses if given.
# This DNS should have both IPv4 and IPv6 if you have double stack in local.
#udp_check_dns: 'dns.google.com:53'
udp_check_dns: 'dns.google.com:53,8.8.8.8,1.1.1.1'

check_interval: 30s

# Group will switch node only when new_latency <= old_latency - tolerance.
check_tolerance: 50ms

##### Connecting options.

# Optional values of dial_mode are:
# 1. "ip". Dial proxy using the IP from DNS directly. This allows your ipv4, ipv6 to choose the optimal path
#       respectively, and makes the IP version requested by the application meet expectations. For example, if you
#       use curl -4 ip.sb, you will request IPv4 via proxy and get a IPv4 echo. And curl -6 ip.sb will request IPv6.
#       This may solve some wierd full-cone problem if your are be your node support that. Sniffing will be disabled
#       in this mode.
# 2. "domain". Dial proxy using the domain from sniffing. This will relieve DNS pollution problem to a great extent
#       if have impure DNS environment. Generally, this mode brings faster proxy response time because proxy will
#       re-resolve the domain in remote, thus get better IP result to connect. This policy does not impact routing.
#       That is to say, domain rewrite will be after traffic split of routing and dae will not re-route it.
# 3. "domain+". Based on domain mode but do not check the reality of sniffed domain. It is useful for users whose
#       DNS requests do not go through dae but want faster proxy response time. Notice that, if DNS requests do not
#       go through dae, dae cannot split traffic by domain.
# 4. "domain++". Based on domain+ mode but force to re-route traffic using sniffed domain to partially recover
#       domain based traffic split ability. It doesn't work for direct traffic and consumes more CPU resources.
dial_mode: domain

# Allow insecure TLS certificates. It is not recommended to turn it on unless you have to.
allow_insecure: false

# Timeout to waiting for first data sending for sniffing. It is always 0 if dial_mode is ip. Set it higher is useful
# in high latency LAN network.
sniffing_timeout: 100ms

# TLS implementation. tls is to use Go's crypto/tls. utls is to use uTLS, which can imitate browser's Client Hello.
tls_implementation: tls

# The Client Hello ID for uTLS to imitate. This takes effect only if tls_implementation is utls.
# See more: https://github.com/daeuniverse/dae/blob/331fa23c16/component/outbound/transport/tls/utls.go#L17
utls_imitate: chrome_auto

}

See https://github.com/daeuniverse/dae/blob/main/docs/en/configuration/dns.md for full examples.

dns { upstream {

这是上游adguardhome

    localdns: 'udp://127.0.0.1:1745'
}
routing {
    request {
        fallback: localdns
    }
    response {
        fallback: accept
    }
}

}

Node group (outbound).

group { proxy {

Filter nodes from the global node pool defined by the subscription and node section above.

    #filter: subtag(regex: '^my_', another_sub) && !name(keyword: 'ExpireAt:')

    # Filter nodes from the global node pool defined by tag.
    #filter: name(node1, node2)

    # Filter nodes and give a fixed latency offset to archive latency-based failover.
    # In this example, there is bigger possibility to choose US node even if original latency of US node is higher.
    filter: name(keyword: 'HK')
    #filter: name(US_node) [add_latency: -500ms]

    # Select the node with min average of the last 10 latencies from the group for every connection.
    policy: min_moving_avg
}

}

See https://github.com/daeuniverse/dae/blob/main/docs/en/configuration/routing.md for full examples.

routing {

Preset rules.

l4proto(udp) && dport(443) -> block
pname(mosdns, dnsmasq) && l4proto(udp) && dport(53) -> must_direct

dip(224.0.0.0/3, 'ff00::/8') -> direct
dip(geoip:private) -> direct

dip(223.5.5.5, 223.6.6.6) -> direct
dip(8.8.8.8, 8.8.4.4) -> proxy
domain(full: dns.alidns.com) -> direct
domain(full: dns.googledns.com) -> proxy
domain(full: dns.opendns.com) -> proxy
domain(full: cloudflare-dns.com) -> proxy

########################## Must Direct Start #########################

# Google GCM
domain(suffix: mtalk.google.com) -> direct

########################## Must Direct End ############################

### GeoSite proxy

# Goole Play
domain(keyword: googleapis) -> proxy

domain(geosite: linkedin) -> proxy
domain(geosite: speedtest) -> proxy
domain(geosite: yahoo) -> proxy
domain(geosite: github) -> proxy
domain(geosite: twitter) -> proxy
domain(geosite: telegram) -> proxy
domain(geosite: google) -> proxy
domain(geosite: category-container) -> proxy
domain(geosite: category-dev) -> proxy
domain(geosite: google-scholar) -> proxy
domain(geosite: category-scholar-!cn) -> proxy
domain(geosite: category-cryptocurrency) -> proxy
domain(geosite: geolocation-!cn) -> proxy

### GeoSite Direct

domain(geosite: alibaba) -> direct
domain(geosite: bilibili) -> direct
domain(geosite: bilibili2) -> direct
domain(geosite: tencent) -> direct
domain(geosite: zhihu) -> direct
domain(geosite: cloudflare-cn) -> direct
domain(geosite: category-scholar-cn) -> direct
domain(geosite: category-media-cn) -> direct
domain(geosite: category-social-media-cn) -> direct
domain(geosite: category-dev-cn) -> direct
domain(geosite: category-bank-cn) -> direct
domain(geosite: apple) -> direct
domain(geosite: microsoft) -> direct
domain(geosite: geolocation-cn) -> direct
domain(geosite: cn) -> direct

# GeoIP
dip(geoip: cn) -> direct

fallback: proxy

}



### Anything else?

_No response_
dae-prow[bot] commented 11 months ago

Thanks for opening this issue!

Basstorm commented 11 months ago

❣️ This issue is marked as wontfix as you have not yet starred this repo. Please kindly consider giving a star to this repo. Your support means a lot to us. Thanks for your understanding. After you become a stargazer, please also reply to this message with the keyword understood. Afterward, I will reopen this issue for you. Once again, your support is much appreciated. Cheers.

understood

mzz2017 commented 11 months ago

啥节点

Basstorm commented 11 months ago

啥节点

机场ss

mzz2017 commented 11 months ago

@Basstorm 日志里怎么说,是不是在跑udp比方说bt下载

Basstorm commented 11 months ago

@Basstorm 日志里怎么说,是不是在跑udp比方说bt下载

没有任何bt下载相关的,倒是会有几个websocket长连接(binance网页版),日志里也没有udp流量

ArnoChenFx commented 10 months ago

我也遇到了

phenixcxz commented 7 months ago

同样有问题,用着用着内存爆炸

Scirese commented 5 months ago

仍然未修复 环境: OpenWRT 23.05-SNAPSHOT arm64 in lxc, Linux 6.10.0-rc2, daed v0.4.1 不过变成了现有的线程会无限消耗内存 image image

i-Eureka commented 2 months ago

仍然未修复 环境: OpenWRT 23.05-SNAPSHOT arm64 in lxc, Linux 6.10.0-rc2, daed v0.4.1 不过变成了现有的线程会无限消耗内存 image image

想知道你是怎么突破主机对LXC容器内核权限的限制而运行dae的,LXC不是和主机共用内核吗,dae对内核的操作不会直接影响主机吗

jschwinger233 commented 2 months ago

现在最新的 dae 支持 reload 开启 pprof:

global {
    # Set non-zero value to enable pprof.
    pprof_port: 0
}

可以先检查 goroutine 是否有泄漏: 浏览器打开 http://localhost:$pprof_port/debug/pprof/goroutine?debug=2 然后检查堆对象: curl -s http://localhost:<port>/debug/pprof/heap > heap_profile.out && go tool pprof heap_profile.out 然后 top 看最大堆 (也可以动态看不需要 dump)