DNSCrypt / dnscrypt-proxy

dnscrypt-proxy 2 - A flexible DNS proxy, with support for encrypted DNS protocols.
https://dnscrypt.info
ISC License
11.4k stars 1.01k forks source link

dnscrypt-proxy memory leak #1352

Closed makedir closed 4 years ago

makedir commented 4 years ago

Using dnscrypt-proxy2 version 2.0.42-1 on a OpenWRT WRT1200 router, and have the following issue, that it becomes linear bigger every day, mostly after reloading dnsmasq every 24h or twice a day. After 28 days of runtime, it already uses about 213MB of RAM, and getting bigger every day around 3-4MB:

https://i.imgur.com/fGWnljN.png

Any way to debug why this is happening? Doesnt it free used memory? Cache size is set to 1024.

jedisct1 commented 4 years ago

Memory is freed automatically, but the Go runtime only does it from time to time. So, memory goes up and then suddenly goes down.

Version 2.0.43 forces this to happen a bit more frequently, especially to get rid of all the costly HTTP stuff if you happen to only use DNSCrypt.

makedir commented 4 years ago

What do you mean "more frequently" it never happened in over 28 days, linear growing every day by 3-4MB. It is after 28 days already using 43% memory. This is NOT acceptable.

welwood08 commented 4 years ago

For what it's worth, I don't recall seeing such high memory usage but I've started using GODEBUG=gctrace=1,scavtrace=1 to get a better idea of what's going on.

@jedisct1 I've not yet updated to use the recently added forced garbage collection, but I'm still seeing forced garbage collection triggered roughly every 2 minutes. If something else in the codebase, perhaps a dependency, is already regularly forcing GC then further investigation of a memory leak may be warranted.

jedisct1 commented 4 years ago

I tried pprof and didn’t see anything obvious.

Most of the memory is used by the HTTP stack and by blacklists (we may want to see if we can make these more efficient, but they are loaded only once, so this is off-topic).

Goroutines didn’t seem to be accumulating either.

But I don’t have much experience investigating resource leaks in Go, so you may do a better job.

Not sure how “restarting dnsmasq” could be related, but if dnscrypt-proxy is restarted simultaneously with an updated blacklist, and that blacklist keeps growing, that may explain it.

welwood08 commented 4 years ago

I can't say I have much experience with Go resource leaks either, but I figure it can't hurt to learn something new. I agree that the loaded-once stuff is unlikely to be relevant, and I'd be glad if it were as simple as optimising the blacklist plugin (and related ones).

@makedir Could you provide your configuration file so we can try to reproduce with the features you're using? Why do you think reloading dnsmasq is the cause of a large memory usage increase in dnscrypt-proxy? Are you also restarting dnscrypt-proxy every time you reload dnsmasq or is it running for the whole 28 days in your original screenshot?

makedir commented 4 years ago

https://www.dropbox.com/s/ow6ccnm7nee1r7h/dnscrypt-proxy.toml?dl=0

I didnt really much change anything just cache size to 1024 I think it was 512 default, and changed default DNS to 8.8.8.8. I thought dnsproxy was maybe restarting together with dnsmasq I have several reload rules over the day I think, one every 24h but also another when VPN goes up and down.

I dont use a modified blacklist just default, but resolver and relays file seem to be recreated every day. https://i.imgur.com/uUKzDsZ.png Also why does this need to happen in /etc/dnscrypt-proxy2 and not in /tmp ? I dont really want daily write on the flash memory.

welwood08 commented 4 years ago

That does look like a fairly simple configuration so I don't have any good ideas for what could be behind a significant memory leak. I'll keep an eye on my own debug logs for now and see if anything stands out, but bear in mind it could be an architecture-specific problem with the Go runtime and I don't have an ARM-based system to experiment with.

Regarding the *.md* files being written daily, they aren't actually being written if the content is unchanged, just the modification time touched so a future run knows if it checked recently. You can tell the difference because if the content had changed then both the *.md and corresponding *.md.minisig would have similar modification time.

It does not need to happen in /etc/dnscrypt-proxy2, that's just the current directory on your system. Provide an absolute path in the cache_file setting for each of your sources if you want to put it somewhere else. Be aware that if you move them to volatile storage, they can't be used at startup after a reboot.

riobard commented 4 years ago

Go 1.12 runtime has this change regarding to memory release:

On Linux, the runtime now uses MADV_FREE to release unused memory. This is more efficient but may result in higher reported RSS. The kernel will reclaim the unused data when it is needed. To revert to the Go 1.11 behavior (MADV_DONTNEED), set the environment variable GODEBUG=madvdontneed=1.

If dnscrypt-proxy does not have major memory leaks, the reason RSS not going down is likely because the OS never needs to take those MADV_FREE'ed memory back.

jedisct1 commented 4 years ago

@riobard That would explain it.

Thanks a lot for pointing out that change!

The traditional output from the top command, especially on Linux, is clearly not enough to understand how memory is being used,