DNSCrypt / dnscrypt-proxy

dnscrypt-proxy 2 - A flexible DNS proxy, with support for encrypted DNS protocols.
https://dnscrypt.info
ISC License
11.28k stars 1k forks source link

oom-killer on an Asus router #1580

Closed jbaker6953 closed 3 years ago

jbaker6953 commented 3 years ago

Running dnscrypt-proxy-linux_arm-2.0.44 on an an Asus router, but over about 100 days or so it will slowly consume all of the available RAM and crash the router. Router logs show:

Jan 9 06:33:44 kernel: nt_monitor invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0, oom_score_adj=0 Jan 9 06:33:44 kernel: [<8005701c>] (unwind_backtrace+0x0/0xf8) from [<800b12fc>] (dump_header.clone.7+0x6c/0x184) Jan 9 06:33:44 kernel: [<800b12fc>] (dump_header.clone.7+0x6c/0x184) from [<800b15bc>] (oom_kill_process.clone.9+0x68/0x178) Jan 9 06:33:44 kernel: [<800b15bc>] (oom_kill_process.clone.9+0x68/0x178) from [<800b1990>] (out_of_memory+0x154/0x2bc) Jan 9 06:33:44 kernel: [<800b1990>] (out_of_memory+0x154/0x2bc) from [<800b4cb0>] (__alloc_pages_nodemask+0x564/0x58c) Jan 9 06:33:44 kernel: [<800b4cb0>] (__alloc_pages_nodemask+0x564/0x58c) from [<800d2a80>] (read_swap_cache_async+0x104/0x1b4) Jan 9 06:33:44 kernel: [<800d2a80>] (read_swap_cache_async+0x104/0x1b4) from [<800d2bb8>] (swapin_readahead+0x88/0x90) Jan 9 06:33:44 kernel: [<800d2bb8>] (swapin_readahead+0x88/0x90) from [<800c675c>] (handle_mm_fault+0x5d8/0x844) Jan 9 06:33:44 kernel: [<800c675c>] (handle_mm_fault+0x5d8/0x844) from [<8005864c>] (do_page_fault+0x178/0x1ec) Jan 9 06:33:44 kernel: [<8005864c>] (do_page_fault+0x178/0x1ec) from [<800503a4>] (do_DataAbort+0x30/0x9c) Jan 9 06:33:44 kernel: [<800503a4>] (do_DataAbort+0x30/0x9c) from [<803eb040>] (ret_from_exception+0x0/0x10) Jan 9 06:33:44 kernel: Exception stack(0x99643fb0 to 0x99643ff8) Jan 9 06:33:44 kernel: 3fa0: 00000011 00000000 7ebe1d7c 00010000 Jan 9 06:33:44 kernel: 3fc0: 00000000 7ebe1dac 7ebe1de4 7ebe1f56 0000869c 00009068 7ebe1df8 00000000 Jan 9 06:33:44 kernel: 3fe0: 2b02be84 7ebe1d78 2b017974 2afd1e84 20000010 ffffffff ... Jan 9 06:33:44 kernel: Out of memory: Kill process 27161 (dnscrypt-proxy) score 31 or sacrifice child Jan 9 06:33:44 kernel: Killed process 27161 (dnscrypt-proxy) total-vm:804952kB, anon-rss:110240kB, file-rss:0kB

I will try upgrading to 2.0.45, but I've had this issue with a previous version (2.0.43?) also. Did not report because I wasn't sure of the cause. I did notice at that time that enabling anonymized DNS greatly accelerated the speed at which it crashed.

I use the stock configuration with the only changes being:

use_syslog = true tls_cipher_suite = [52392, 49199] ## OpenNIC [sources.'opennic'] urls = ['https://raw.githubusercontent.com/DNSCrypt/dnscrypt-resolvers/master/v2/opennic.md', 'https://download.dnscrypt.info/resolvers-list/v2/opennic.md'] minisign_key = 'RWQf6LRCGA9i53mlYecO4IzT51TGPpvWucNSCh1CBM0QTaLn73Y7GFO3' refresh_delay = 72 cache_file = 'opennic.md'

ianbashford commented 3 years ago

Hi @jbaker6953
Quick question: have you determined that dnscrypt's memory usage is increasing over those 100 days?
The oom killer does choose a process that will free "enough" memory by killing the fewest processes, but it can't actually tell that something has a leak.
I believe that oom score represents approx 3% of memory on the machine - I think there's an adjustment for root processes, so if it's running as root it might be more like 6%.

jbaker6953 commented 3 years ago

For years I have tried to figure out how to get an accurate picture of a single process's memory usage on these BusyBox devices. THe version of ps on these does not have -o.

jbaker6953 commented 3 years ago

This is the output of cat /proc/pid/status:

Name:   dnscrypt-proxy
State:  S (sleeping)
Tgid:   26248
Pid:    26248
PPid:   1
TracerPid:      0
Uid:    0       0       0       0
Gid:    0       0       0       0
FDSize: 32
Groups: 0
VmPeak:   803064 kB
VmSize:   803064 kB
VmLck:         0 kB
VmHWM:     13828 kB
VmRSS:     12500 kB
VmData:   802796 kB
VmStk:       264 kB
VmExe:      2588 kB
VmLib:  4294964708 kB
VmPTE:        24 kB
VmSwap:        0 kB
Threads:        8
SigQ:   0/4022
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: fffffffc3bfa3a00
SigIgn: 0000000000200001
SigCgt: fffffffc7fc1fefe
CapInh: 0000000000000000
CapPrm: ffffffffffffffff
CapEff: ffffffffffffffff
CapBnd: ffffffffffffffff
Cpus_allowed:   3
Cpus_allowed_list:      0-1
voluntary_ctxt_switches:        16
nonvoluntary_ctxt_switches:     65
jbaker6953 commented 3 years ago

I can update after it's run for a few days.

jbaker6953 commented 3 years ago

So after two weeks on 2.0.45 the output of cat /proc/pid/status didn't change much, but restarting dnscrypt-proxy had a discernible effect on the output of free:

admin@RT-AC88U-B1E8:/tmp/home/root# free
             total       used       free     shared    buffers     cached
Mem:        515184     282896     232288          0        824      12472
-/+ buffers/cache:     269600     245584
Swap:      2097148          0    2097148
admin@RT-AC88U-B1E8:/tmp/home/root# /opt/etc/init.d/S09dnscrypt-proxy2 restart
 Shutting down dnscrypt-proxy...              done.
 Starting dnscrypt-proxy...              done.
admin@RT-AC88U-B1E8:/tmp/home/root# free
             total       used       free     shared    buffers     cached
Mem:        515184     257380     257804          0        836      15056
-/+ buffers/cache:     241488     273696
Swap:      2097148          0    2097148

The difference accounts for about 5% of the total system memory. If that trend continued it would run the system out of memory in about 120 days (starting from approximately 45% free), which approximately matches my previous crash.