SagerNet / sing-box

The universal proxy platform
https://sing-box.sagernet.org/
Other
20.38k stars 2.43k forks source link

Hysteria2 inbound/server memory leak 1.9.3 - 1.10.0-alpha.29 #2027

Closed SasukeFreestyle closed 1 month ago

SasukeFreestyle commented 3 months ago

Operating system

Linux

System version

Ubuntu 22.04

Installation type

Original sing-box Command Line

If you are using a graphical client, please provide the version of the client.

No response

Version

sing-box version 1.10.0-alpha.29

Environment: go1.22.6 linux/amd64
Tags: with_gvisor,with_quic,with_dhcp,with_wireguard,with_ech,with_utls,with_reality_server,with_acme,with_clash_api
CGO: disabled

Description

I've been getting memory leaks from version 1.9.3 (maybe even earlier) and above. sing-box exceeds well over 2GB of RAM after ~8 hrs of runtime, after some time it crashes. I've currently no log of the crash but can update issue when I've a log from this.

I'm still learning go so bear with me :)

btop: image

goroutine: goroutine

heap: heap

pprof dumps: pprof.heap.pb.gz

pprof.goroutine.pb.gz

Reproduction

This has occurred on all machines I've tested hysteria2 inbound on so it should be easy to reproduce. I would guess you need about ~100 users and after about ~12 hours of uptime

Server certs are generated by openssl.

Server config:

{
   "log":{
      "level":"fatal",
      "timestamp":true
   },
   "experimental":{
      "cache_file":{
         "enabled":true
      }
   },
   "inbounds":[
      {
         "type":"hysteria2",
         "tag":"hy2-in-443",
         "listen":"127.0.0.1",
         "udp_timeout":180,
         "listen_port":443,
         "sniff":true,
         "sniff_override_destination":true,
         "domain_strategy":"prefer_ipv4",
         "up_mbps":0,
         "down_mbps":0,
         "obfs":{
            "type":"salamander",
            "password":"123"
         },
         "users":[
            {
               "name":"user",
               "password":"123"
            }
         ],
         "ignore_client_bandwidth":true,
         "tls":{
            "enabled":true,
            "certificate_path":"ca.crt",
            "key_path":"ca.key"
         }
      }
   ],
   "outbounds":[
      {
         "type":"direct",
         "tag":"direct"
      },
      {
         "type":"block",
         "tag":"block"
      },
      {
         "type":"dns",
         "tag":"dns-out"
      }
   ],
   "dns":{
      "disable_cache":true,
      "servers":[
         {
            "tag":"dns-out",
            "address":"udp://127.0.0.53",
            "address_strategy":"prefer_ipv4",
            "strategy":"prefer_ipv4",
            "detour":"direct"
         }
      ]
   },
   "route":{
      "rules":[
         {
            "ip_is_private":true,
            "outbound":"block"
         },
         {
            "rule_set":[
               "geoip-cn",
               "geoip-ir",
               "geoip-ru",
               "geoip-phishing",
               "geoip-malware",
               "geoip-private",
               "geosite-ir",
               "geosite-malware",
               "geosite-cryptominers",
               "geosite-phishing"
            ],
            "outbound":"block"
         },
         {
            "protocol":"dns",
            "outbound":"dns-out"
         }
      ],
      "rule_set":[
         {
            "tag":"geoip-cn",
            "type":"remote",
            "format":"binary",
            "update_interval":"3d",
            "url":"https://raw.githubusercontent.com/Chocolate4U/Iran-sing-box-rules/rule-set/geoip-cn.srs"
         },
         {
            "tag":"geoip-ir",
            "type":"remote",
            "format":"binary",
            "update_interval":"3d",
            "url":"https://raw.githubusercontent.com/Chocolate4U/Iran-sing-box-rules/rule-set/geoip-ir.srs"
         },
         {
            "tag":"geoip-ru",
            "type":"remote",
            "format":"binary",
            "update_interval":"3d",
            "url":"https://raw.githubusercontent.com/Chocolate4U/Iran-sing-box-rules/rule-set/geoip-ru.srs"
         },
         {
            "tag":"geoip-private",
            "type":"remote",
            "format":"binary",
            "update_interval":"3d",
            "url":"https://raw.githubusercontent.com/Chocolate4U/Iran-sing-box-rules/rule-set/geoip-private.srs"
         },
         {
            "tag":"geosite-ir",
            "type":"remote",
            "format":"binary",
            "update_interval":"3d",
            "url":"https://github.com/bootmortis/sing-geosite/releases/latest/download/geosite-ir.srs"
         },
         {
            "tag":"geoip-phishing",
            "type":"remote",
            "format":"binary",
            "update_interval":"3d",
            "url":"https://raw.githubusercontent.com/Chocolate4U/Iran-sing-box-rules/rule-set/geoip-phishing.srs"
         },
         {
            "tag":"geosite-phishing",
            "type":"remote",
            "format":"binary",
            "update_interval":"3d",
            "url":"https://raw.githubusercontent.com/Chocolate4U/Iran-sing-box-rules/rule-set/geosite-phishing.srs"
         },
         {
            "tag":"geoip-malware",
            "type":"remote",
            "format":"binary",
            "update_interval":"3d",
            "url":"https://raw.githubusercontent.com/Chocolate4U/Iran-sing-box-rules/rule-set/geoip-malware.srs"
         },
         {
            "tag":"geosite-malware",
            "type":"remote",
            "format":"binary",
            "update_interval":"3d",
            "url":"https://raw.githubusercontent.com/Chocolate4U/Iran-sing-box-rules/rule-set/geosite-malware.srs"
         },
         {
            "tag":"geosite-cryptominers",
            "type":"remote",
            "format":"binary",
            "update_interval":"3d",
            "url":"https://raw.githubusercontent.com/Chocolate4U/Iran-sing-box-rules/rule-set/geosite-cryptominers.srs"
         }
      ],
      "auto_detect_interface":true
   }
}

Logs

No response

Supporter

Integrity requirements

nekohasekai commented 3 months ago

I guess the problem was introduced in sing-box 1.10.0-alpha.23. Can you check if the problem exists in versions before alpha 23 or 1.9.3?

SasukeFreestyle commented 3 months ago

I'm sure the problem occurred in 1.9.3 and also versions below see https://github.com/SagerNet/sing-box/issues/1245 With versions 1.9.0+ connections are properly closed, but there is still a leak. Im running 1.9.3 now but it will take some hours for the leak to reproduce again. In the next post I will dump pprof of 1.9.3

SasukeFreestyle commented 3 months ago

So here are the 1.9.3 dumps, took longer than expected. Uptime 1 day 14 hours.

image

heap: heap1

goroutine: goroutine1

heap: pprof.heap.pb.gz goroutine: pprof.goroutine.pb.gz

So I've switched to alpha 22 as of writing this and will dump those in the next post.

Thank you for your time and effort.

SasukeFreestyle commented 3 months ago

1.10.0-alpha.22

image

heap: heap22

goroutine:

goroutine22

heap: pprof.heap.pb.gz goroutine pprof.goroutine.pb.gz

All of these versions I've tested will eventually leak and system will run out of memory.

If you need me to test something else and/or in different conditions I'll be happy to help

erfantkerfan commented 3 months ago

+1

nekohasekai commented 3 months ago

Please try both 1.9.4 and 1.10.0-beta.1

SasukeFreestyle commented 3 months ago

Hello again and hope you're well :)

1.9.4 dump. image

heap: heap-crash

goroutine: goroutine-crash

heap: pprof.heap.pb.gz

goroutine: pprof.goroutine.pb.gz

switched to 1.10.0-beta.2 for testing

SasukeFreestyle commented 3 months ago

1.10.0-beta.2 dump

image

heap: heap-crash

goroutine: goroutine-crash

heap: pprof.heap.pb.gz

goroutine: pprof.goroutine.011.pb.gz

SasukeFreestyle commented 3 months ago

Seems fixed in 1.10.0-beta.4 :) Do you want me to dump pprof of beta.4?

If you don't need them you can close this issue.

nekohasekai commented 3 months ago

I don't think it's been resolved, beta 4 has no changes related to this issue.

nekohasekai commented 3 months ago

I have no progress on your issue, we did fix a memory leak.

The new goroutine snapshots make me think that maybe you did have over 6k active UDP connections causing this occupancy, I'm not sure if there is still a leak.

If you'd like to continue the discussion via IM, you can contact me at Telegram@attachBaseContext or Discord@nekohasekai.

SasukeFreestyle commented 3 months ago

I honestly lack the knowledge to determine how much memory is required for lets say 6k active connections and I agree that it might not be a leak in that regard.

But after testing beta.4 a couple of days now I clearly see multiple times that for example if sing-box is using 700MB, it will after a couple hours drop to around 500MB and then repeat back to 700MB and down again to 500MB.

Anyway here is 15hr uptime pprof of beta.4

I want to thank you for taking your time on this and the reason I'm doing this to help make sing-box be as efficiently as possible

I realize that no changes related to this issue is present is beta.4 but from an end-user view it is magically fixed I guess :)

image

heap: heap-fix

goroutine: goroutine-fix

heap: pprof.heap.pb.gz

goroutine: pprof.goroutine.pb.gz

nekohasekai commented 3 months ago

Can you try 1.9.4 again? By the way, a large number of UDP connections may be caused by protocols such as BitTorrent. You can try blocking protocol dtls and bittorrent.

SasukeFreestyle commented 3 months ago

I added the following lines to my configuration under rules and have been testing it for a couple of days. Version 1.9.4

      "rules":[
         {
            "outbound":"block",
            "protocol":[
               "bittorrent",
               "dtls"
            ]
         },

Memory usage is stable. Between 450MB and 750MB during peak-hours.

heap: heap-fix goroutine: goroutine-fix

heap: pprof.heap.pb.gz goroutine: pprof.goroutine.pb.gz

I don't mind blocking DTLS/Bittorrent traffic and if this is a fix for the memory usage I'm happy with the results and thank you for the tip.

I just want to also point out that after adding these rules I checked if Bittorrent was blocked and it worked, Using qBittorrent But that's another issue not related to this topic.

SasukeFreestyle commented 1 month ago

I'm closing this issue for now as for me I consider this to be fixed, Sing-box never exceeds over 1GB with the amount of users I got. Version 1.9.6 / 1.10.0-beta.11

Thank you for your support! Be well :)