daeuniverse / dae

eBPF-based Linux high-performance transparent proxy solution.
GNU Affero General Public License v3.0
2.63k stars 165 forks source link

[Bug Report] <长时间运行后,CPU占用高,性能下降> #484

Closed jdjingdian closed 2 months ago

jdjingdian commented 3 months ago

Checks

Current Behavior

正常时候打开youtube自动可以播4k的,异常时youtube连接速度可能不到10000 Kbps

CPU占用也会持续超过100%

top - 22:37:47 up 1 day, 23:01,  1 user,  load average: 1.41, 1.08, 1.04
Tasks:  99 total,   2 running,  97 sleeping,   0 stopped,   0 zombie
%Cpu(s): 13.8 us, 13.2 sy,  0.0 ni, 56.1 id,  0.8 wa,  0.0 hi, 10.5 si,  5.5 st 
MiB Mem :   1975.9 total,    372.4 free,   1601.7 used,     73.6 buff/cache     
MiB Swap:    977.0 total,    522.4 free,    454.6 used.    374.2 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                               
    901 root      20   0 3579860   1.3g   9300 S 138.4  68.1     42,00 dae  

Expected Behavior

No response

Steps to Reproduce

  1. 长时间运行,运行时间越长性能越低

Environment

dae version v0.5.1
go runtime go1.21.5 linux/arm64
Copyright (c) 2022-2024 @daeuniverse
License GNU AGPLv3 <https://github.com/daeuniverse/dae/blob/main/LICENSE>
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Linux debian-dae 6.1.0-18-arm64 #1 SMP Debian 6.1.76-1 (2024-02-01) aarch64 GNU/Linux

global {
    ##### Software options.

    # tproxy port to listen on. It is NOT a HTTP/SOCKS port, and is just used by eBPF program.
    # In normal case, you do not need to use it.
    tproxy_port: 12345

    # Set it true to protect tproxy port from unsolicited traffic. Set it false to allow users to use self-managed
    # iptables tproxy rules.
    tproxy_port_protect: true

    # If not zero, traffic sent from dae will be set SO_MARK. It is useful to avoid traffic loop with iptables tproxy
    # rules.
    so_mark_from_dae: 0

    # Log level: error, warn, info, debug, trace.
    log_level: error

    # Disable waiting for network before pulling subscriptions.
    disable_waiting_network: false

    ##### Interface and kernel options.

    # The LAN interface to bind. Use it if you want to proxy LAN.
    # Multiple interfaces split by ",".
    lan_interface: enp0s11

    # The WAN interface to bind. Use it if you want to proxy localhost.
    # Multiple interfaces split by ",". Use "auto" to auto detect.
    wan_interface: auto

    # Automatically configure Linux kernel parameters like ip_forward and send_redirects. Check out
    # https://github.com/daeuniverse/dae/blob/main/docs/en/user-guide/kernel-parameters.md to see what will dae do.
    auto_config_kernel_parameter: true

    ##### Node connectivity check.

    # Host of URL should have both IPv4 and IPv6 if you have double stack in local.
    # First is URL, others are IP addresses if given.
    # Considering traffic consumption, it is recommended to choose a site with anycast IP and less response.
    #tcp_check_url: 'http://cp.cloudflare.com'
    tcp_check_url: 'http://cp.cloudflare.com,1.1.1.1,2606:4700:4700::1111'

    # The HTTP request method to `tcp_check_url`. Use 'HEAD' by default because some server implementations bypass
    # accounting for this kind of traffic.
    tcp_check_http_method: HEAD

    # This DNS will be used to check UDP connectivity of nodes. And if dns_upstream below contains tcp, it also be used to check
    # TCP DNS connectivity of nodes.
    # First is URL, others are IP addresses if given.
    # This DNS should have both IPv4 and IPv6 if you have double stack in local.
    #udp_check_dns: 'dns.google.com:53'
    udp_check_dns: 'dns.google.com:53,8.8.8.8,2001:4860:4860::8888'

    check_interval: 30s

    # Group will switch node only when new_latency <= old_latency - tolerance.
    check_tolerance: 50ms

    ##### Connecting options.

    # Optional values of dial_mode are:
    # 1. "ip". Dial proxy using the IP from DNS directly. This allows your ipv4, ipv6 to choose the optimal path
    #       respectively, and makes the IP version requested by the application meet expectations. For example, if you
    #       use curl -4 ip.sb, you will request IPv4 via proxy and get a IPv4 echo. And curl -6 ip.sb will request IPv6.
    #       This may solve some wierd full-cone problem if your are be your node support that. Sniffing will be disabled
    #       in this mode.
    # 2. "domain". Dial proxy using the domain from sniffing. This will relieve DNS pollution problem to a great extent
    #       if have impure DNS environment. Generally, this mode brings faster proxy response time because proxy will
    #       re-resolve the domain in remote, thus get better IP result to connect. This policy does not impact routing.
    #       That is to say, domain rewrite will be after traffic split of routing and dae will not re-route it.
    # 3. "domain+". Based on domain mode but do not check the reality of sniffed domain. It is useful for users whose
    #       DNS requests do not go through dae but want faster proxy response time. Notice that, if DNS requests do not
    #       go through dae, dae cannot split traffic by domain.
    # 4. "domain++". Based on domain+ mode but force to re-route traffic using sniffed domain to partially recover
    #       domain based traffic split ability. It doesn't work for direct traffic and consumes more CPU resources.
    dial_mode: domain

    # Allow insecure TLS certificates. It is not recommended to turn it on unless you have to.
    allow_insecure: true

    # Timeout to waiting for first data sending for sniffing. It is always 0 if dial_mode is ip. Set it higher is useful
    # in high latency LAN network.
    sniffing_timeout: 100ms

    # TLS implementation. tls is to use Go's crypto/tls. utls is to use uTLS, which can imitate browser's Client Hello.
    tls_implementation: tls

    # The Client Hello ID for uTLS to imitate. This takes effect only if tls_implementation is utls.
    # See more: https://github.com/daeuniverse/dae/blob/331fa23c16/component/outbound/transport/tls/utls.go#L17
    utls_imitate: chrome_auto
}

# Subscriptions defined here will be resolved as nodes and merged as a part of the global node pool.
# Support to give the subscription a tag, and filter nodes from a given subscription in the group section.
subscription {
}

# Nodes defined here will be merged as a part of the global node pool.
node {
    sg_uu: 'vless://'
    kr_xtls: 'vless://'
    poless_hk: 'ss://'
    poless_tw: 'ss://'
    poless_sg: 'ss://'
    poless_jp: 'ss://'
    poless_kr: 'ss://'
    poless_us: 'ss://'
}

# See https://github.com/daeuniverse/dae/blob/main/docs/en/configuration/dns.md for full examples.
dns {
    # For example, if ipversion_prefer is 4 and the domain name has both type A and type AAAA records, the dae will only
    # respond to type A queries and response empty answer to type AAAA queries.
    #ipversion_prefer: 4

    # Give a fixed ttl for domains. Zero means that dae will request to upstream every time and not cache DNS results
    # for these domains.
    #fixed_domain_ttl {
    #    ddns.example.org: 10
    #    test.example.org: 3600
    #}

    upstream {
        # Value can be scheme://host:port, where the scheme can be tcp/udp/tcp+udp.
        # If host is a domain and has both IPv4 and IPv6 record, dae will automatically choose
        # IPv4 or IPv6 to use according to group policy (such as min latency policy).
        # Please make sure DNS traffic will go through and be forwarded by dae, which is REQUIRED for domain routing.
        # If dial_mode is "ip", the upstream DNS answer SHOULD NOT be polluted, so domestic public DNS is not recommended.

        alidns: 'udp://223.5.5.5:53'
        googledns: 'tcp+udp://dns.google.com:53'
    }
    routing {
        # According to the request of dns query, decide to use which DNS upstream.
        # Match rules from top to bottom.
        request {
            qname(geosite:cn) -> alidns
            fallback: googledns        
        }
        # According to the response of dns query, decide to accept or re-lookup using another DNS upstream.
        # Match rules from top to bottom.
        response {
            # Trusted upstream. Always accept its result.
            upstream(googledns) -> accept
            # Possibly polluted, re-lookup using googledns.
            !qname(geosite:cn) && ip(geoip:private) -> googledns
            # fallback is also called default.
            fallback: accept
        }
    }
}

# Node group (outbound).
group {
    my_group {
        # No filter. Use all nodes.

        # Randomly select a node from the group for every connection.
        #policy: random

        # Select the first node from the group for every connection.
        #policy: fixed(0)

        # Select the node with min last latency from the group for every connection.
        #policy: min

        # Select the node with min moving average of latencies from the group for every connection.
        policy: min_moving_avg
    }
}
# See https://github.com/daeuniverse/dae/blob/main/docs/en/configuration/routing.md for full examples.
routing {
    ### Preset rules.

    # Network managers in localhost should be direct to avoid false negative network connectivity check when binding to
    # WAN.
    pname(NetworkManager) -> direct

    # Put it in the front to prevent broadcast, multicast and other packets that should be sent to the LAN from being
    # forwarded by the proxy.
    # "dip" means destination IP.
    dip(224.0.0.0/3, 'ff00::/8') -> direct
    # This line allows you to access private addresses directly instead of via your proxy. If you really want to access
    # private addresses in your proxy host network, modify the below line.
    dip(geoip:private) -> direct

    ### Write your rules below.
    # Apple Service Direct    
    dip(17.0.0.0/8) -> direct
    domain(geosite:apple) -> direct                                                                                                                                       
    domain(geosite:icloud) -> direct
    domain(geosite:icloudprivaterelay) -> direct
    #sg wechat direct
    dip(101.32.104.41) && dip(43.160.144.13) && dip(43.160.144.21) && dip(43.156.222.216) -> direct
    dip(geoip:cn) -> direct
    domain(geosite:cn) -> direct
    domain(keyword: weixin) -> direct
    domain(keyword: apple) -> direct
    domain(keyword: wechat) -> direct
    domain(keyword: porn) -> my_group
    fallback: my_group
}

Anything else?

暂时也没有头绪是哪里引起的,这个系统上只跑了dae

如果只是通过systemctl restart dae的方式重启服务的话,性能还是很差,必须重启系统后才能恢复,但是一段时间后(大概1~2天)又会感觉到性能下降

dae-prow[bot] commented 3 months ago

Thanks for opening this issue!

jschwinger233 commented 3 months ago

不如先试试 main 版本,可以从 daily build 下载:https://github.com/daeuniverse/dae/actions/runs/8439276742

如果 main 还是有问题我们编译一个有符号版本运行 pprof,做一个 goroutine dump 和 perf record 应该足够诊断。

jdjingdian commented 3 months ago

我目前的操作: systemctl stop dae systemctl disable dae

下载https://github.com/daeuniverse/dae/actions/runs/8439276742 这个版本dailybuild的arm64版本并解压,并给予dae-linux-arm64运行权限

然后用screen开了一个窗口运行, ./dae-linux-arm64 run --disable-timestamp -c /usr/local/etc/dae/config.dae

目前运行时候cpu占用在8%~20%,中间也有几次观察到CPU占用飙升到300%,但没有像之前版本持续性高占用,youtube视频播放速度看起来也比较正常。

我再继续观察几天

mzz2017 commented 3 months ago

看起来dae占用了大量的内存,你是否开启了交换空间? free -h

一般情况下,vless/ss不会占用这么多内存,你可以确认一下是否有bt下载

当内存占用过大甚至接近耗尽时,会消耗大量cpu在swap和物理内存之间换入换出

jdjingdian commented 3 months ago

看起来dae占用了大量的内存,你是否开启了交换空间? free -h

一般情况下,vless/ss不会占用这么多内存,你可以确认一下是否有bt下载

当内存占用过大甚至接近耗尽时,会消耗大量cpu在swap和物理内存之间换入换出

我这个机器性能确实比较弱,rk3566的cpu,总共4g内存,分了2g给debian虚拟机跑dae

root@debian-dae:~# free -h
               total        used        free      shared  buff/cache   available
Mem:           1.9Gi       1.1Gi       801Mi       3.9Mi        66Mi       799Mi
Swap:          976Mi       318Mi       658Mi

目前cpu占用好像持续很高,内存占用确实也比较多

top - 00:19:22 up 3 days, 43 min,  1 user,  load average: 3.19, 2.30, 2.68
Tasks: 100 total,   1 running,  99 sleeping,   0 stopped,   0 zombie
%Cpu(s): 29.2 us, 24.6 sy,  0.0 ni, 19.2 id,  0.0 wa,  0.0 hi, 18.7 si,  8.3 st 
MiB Mem :   1975.9 total,    171.4 free,   1801.6 used,     75.7 buff/cache     
MiB Swap:    977.0 total,    653.9 free,    323.1 used.    174.2 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                       
  13506 root      20   0 3311484   1.2g   8024 S 268.4  62.3 125:39.96 dae-linux-arm64   
jdjingdian commented 2 months ago

目前运行稳定,没有再出现长时间运行后卡顿的情况