CZ-NIC / knot-resolver

Knot Resolver - resolve DNS names like it's 2024
https://www.knot-resolver.cz/
Other
353 stars 59 forks source link

High memory consumption #94

Open i7an opened 1 year ago

i7an commented 1 year ago

Hi,

I encountered some very strange behavior with knot resolver. For some reason this config causes the kresd process to bloat linearly (~10Mb / hour) and eat hundreds megabytes of memory even without any load:

cache.size = 100 * MB
cache.open(100 * MB, 'lmdb://./tmp/knot-cache')
cache.max_ttl(300)

But when I set max_ttl before opening a cache file the problem disappears and the memory footprint stays at ~17Mb:

cache.size = 100 * MB
cache.max_ttl(300)
cache.open(100 * MB, 'lmdb://./tmp/knot-cache')

Here is the docker file I used:

Dockerfile ``` FROM debian:11-slim RUN apt update RUN apt install -y wget RUN wget https://secure.nic.cz/files/knot-resolver/knot-resolver-release.deb RUN dpkg -i knot-resolver-release.deb RUN apt update RUN apt install -y knot-resolver COPY config/knot-resolver/kresd.conf /etc/knot-resolver/kresd.conf ENTRYPOINT ["kresd"] CMD ["-c", "/etc/knot-resolver/kresd.conf", "-n"] ```

I would be grateful for any ideas and debug suggestions.

UPD Apparently the lower max_ttl the quicker RAM is consumed. Calling cache.clear() does nothing. Running kres-cache-gc does nothing.

vcunat commented 1 year ago

cache.open() resets the TTL limits.

i7an commented 1 year ago

@vcunat could you please elaborate more on how it may cause constant memory growth. 5 mins ttl seems harmless to me.

vcunat commented 1 year ago

No, the growth itself does sound like a bug. Reducing TTL will make resolver do more work, etc. but otherwise it's probably just some coincidence that it triggers that bug/growth.

I just wanted to point out that swapping the lines is basically the same as not changing the TTL limit.

i7an commented 1 year ago

Thanks for pointing that out. It was not obvious to me.

vcunat commented 1 year ago

I see two plausible options:

  1. the allocator (jemalloc in this case) still does not like the resulting allocation patterns and results into a very sparse heap. (Lots of RAM taken from OS but only small percentage of that actually allocated by kresd.) https://gitlab.nic.cz/knot/knot-resolver/-/merge_requests/1353#note_265895

  2. a genuine leak (unreachable memory), but we haven't heard of any significant one so far (in terms of consumed amount of RAM). It will be probably easiest recognizable by setting variable MALLOC_CONF=prof_leak:true,lg_prof_sample:0,prof_final:true and possibly later inspecting details according to docs.

i7an commented 1 year ago

I'll definitely investigate your suggestions. Thanks for sharing. 🙇‍♂️

@vcunat But I am still puzzled by the fact that using such a simple setting as max_ttl causes this problem and it was not noticed before... Can you advise what else I can check to discard the possibility of a simple error in my configuration. As I mentioned in the UPD section I tried clearing the cache with cache.clear() and running kres-cache-gc with no affect on the memory footprint.

vcunat commented 1 year ago

Cache size is unrelated; that's always exactly 100 MiB file, mapped to memory (according to your config).

vcunat commented 1 year ago

I mean, the cache file will be part of the RAM usage that you see, but it has that hard upper limit.