Memory leak in go-dnscollector

PenelopeFudd commented 2 months ago

Describe the bug As we use go-dnscollector, it uses more and more memory until it's killed by the OOM killer.

top - 16:26:16 up 4 days,  3:13,  4 users,  load average: 0.26, 0.63, 4.71
Tasks: 278 total,   1 running, 277 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.1 us,  0.1 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  32090.4 total,  16212.1 free,  15508.6 used,    369.7 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  16190.8 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                   
 246535 dnscoll+  20   0   15.2g   4.9g  17812 S   0.0  15.7 275:21.59 go-dnscollector                                                                                           
 246522 dnscoll+  20   0   14.9g   4.8g   8200 S   0.0  15.5 234:18.20 go-dnscollector                                                                                           
 246520 dnscoll+  20   0   15.1g   3.4g  12576 S   0.0  10.8 265:05.95 go-dnscollector                                                                                           
 247399 root      20   0   13.5g 922304  14472 S   0.0   2.8 677:30.59 filebeat                                                                                                  
 244201 pdns      20   0 9957.0m 254952      0 S   0.0   0.8   8:20.23 pdns_server                                                                                               
 245632 prometh+  20   0 2837384  79924  15940 S   2.7   0.2 186:21.77 prometheus                                                                                                
 854727 _dnsdist  20   0 2239708  38620      0 S   0.3   0.1  32:57.53 dnsdist                                                                                                   
 854768 _dnsdist  20   0 2448604  38448      0 S   0.3   0.1  43:52.51 dnsdist                                                                                                   
 854928 _dnsdist  20   0 2448604  38388      0 S   0.3   0.1  51:20.60 dnsdist

# systemctl list-units dns\*

  UNIT                           LOAD   ACTIVE SUB     DESCRIPTION     
● dnscollector@backend1.service  loaded failed failed  Go DnsCollector
● dnscollector@backend10.service loaded failed failed  Go DnsCollector
● dnscollector@backend2.service  loaded failed failed  Go DnsCollector
  dnscollector@backend3.service  loaded active running Go DnsCollector
  dnscollector@backend4.service  loaded active running Go DnsCollector
  dnscollector@backend5.service  loaded active running Go DnsCollector
● dnscollector@backend6.service  loaded failed failed  Go DnsCollector
● dnscollector@backend7.service  loaded failed failed  Go DnsCollector
● dnscollector@backend8.service  loaded failed failed  Go DnsCollector
● dnscollector@backend9.service  loaded failed failed  Go DnsCollector
  dnsdist@backend1.service       loaded active running DNS Loadbalancer
  dnsdist@backend10.service      loaded active running DNS Loadbalancer
  dnsdist@backend2.service       loaded active running DNS Loadbalancer
  dnsdist@backend3.service       loaded active running DNS Loadbalancer
  dnsdist@backend4.service       loaded active running DNS Loadbalancer
  dnsdist@backend5.service       loaded active running DNS Loadbalancer
  dnsdist@backend6.service       loaded active running DNS Loadbalancer
  dnsdist@backend7.service       loaded active running DNS Loadbalancer
  dnsdist@backend8.service       loaded active running DNS Loadbalancer
  dnsdist@backend9.service       loaded active running DNS Loadbalancer

We initially added LimitDATA=5500M and later MemoryMax=1500M to the systemd service unit file. Then we graphed MemoryCurrent over time:

for n in {1..100};do echo $(date +%s),$(systemctl show dnscollector@backend1 -P MemoryCurrent); sleep 1; done | tee /tmp/memory.csv

The process ran out of memory after 45 seconds.

To Reproduce We have 10 copies of dnsdist running, each sending dnstap logs to a separate copy of dnscollector. Excerpt from /etc/dnsdist/dnsdist-backend1.conf:

  tap_logging = newFrameStreamTcpLogger("127.0.0.1:6001")
  addAction(AllRule(), DnstapLogAction("dnsdist_server", tap_logging))
  addResponseAction(AllRule(), DnstapLogResponseAction("dnsdist_server", tap_logging))
  addCacheHitResponseAction(AllRule(), DnstapLogResponseAction("dnsdist_server", tap_logging))

Contents of /etc/dnscollector/config-backend1.yml:

################################################
# global configuration
#  more details: https://github.com/dmachard/go-dnscollector/blob/main/docs/configuration.md#global
################################################
global:
  trace:
    verbose: true
  server-identity: "dns-collector"
  pid-file: ""
  text-format: "timestamp-rfc3339ns identity operation rcode queryip queryport family protocol length-unit qname qtype edns-csubnet latency answer"
  text-format-delimiter: " "
  text-format-boundary: "\""
  text-jinja: ""
  worker:
    interval-monitor: 10
    buffer-size: 500000
  telemetry:
    enabled: true
    web-path: "/metrics"
    web-listen: ":9166"
    prometheus-prefix: "dnscollector_exporter"
    tls-support: false
    tls-cert-file: ""
    tls-key-file: ""
    client-ca-file: ""
    basic-auth-enable: false
    basic-auth-login: admin
    basic-auth-pwd: changeme

################################################
# Pipelining configuration
#   more details: https://github.com/dmachard/go-dnscollector/blob/main/docs/running_mode.md#pipelining
#   workers: https://github.com/dmachard/go-dnscollector/blob/main/docs/workers.md
#   transformers: https://github.com/dmachard/go-dnscollector/blob/main/docs/transformers.md
################################################
pipelines:

  - name: tap
    dnstap:
      listen-ip: 127.0.0.1
      listen-port: 6001
      chan-buffer-size: 1000000
    transforms:
      normalize:
        qname-lowercase: true
        qname-replace-nonprintable: true
      latency:
        measure-latency: true
        unanswered-queries: true
        queries-timeout: 2
      geoip:
        mmdb-asn-file: "/var/lib/GeoIP/GeoLite2-ASN.mmdb"
        mmdb-city-file: "/var/lib/GeoIP/GeoLite2-City.mmdb"
        mmdb-country-file: "/var/lib/GeoIP/GeoLite2-Country.mmdb"
        # Copy from node3:/var/lib/GeoIP/*.mmdb
    routing-policy:
      forward: [ filebeat ]
      # forward: [ file, filebeat ]
      dropped: [ ]

  # Write DNS logs to a local file in TEXT format
  - name: filebeat
    tcpclient:
      transport: unix
      remote-address: /var/run/filebeat.sock
      connect-timeout: 5
      retry-interval: 10
      flush-interval: 5
      mode: json
      buffer-size: 10
      chan-buffer-size: 1000000
    transforms:
      relabeling:
        remove:
          - regex: "dns[.](flags|id|questions-cont|resource-records.(ar|ns))"
          - regex: "dnstap[.](extra|identity|peer-name|policy-|query-zone|version)"
          - regex: "network[.](family|ip-defragmented|tcp-reassembled)"
          - regex: "edns[.](udp-size|rcode|version|dnssec-ok|options.[0-9].(code|name))"

The reason we have 10 copies of dnsdist + dnscollector is because dnsdist started discarding dnstap records under high load, as dnscollector wasn't keeping up. If we could have configured dnscollector to start ten threads, or dnsdist to send dnstap records to ten different backends, we would have. The goal is to handle 200k DNS requests per second, which ends up being 400k records per second.

Expected behavior No memory leaking. I can't imagine what dnscollector could be retaining records for apart from latency calculations, and those expire after 2 seconds.

Additional context

Version bacdb535927b96f4c1fde0b753ba55f0189c2199 plus one patch:


diff --git a/transformers/geoip.go b/transformers/geoip.go
index 02dd854..190ab3c 100644
--- a/transformers/geoip.go
+++ b/transformers/geoip.go
@@ -4,6 +4,7 @@ import (
    "fmt"
    "net"
    "strconv"
+        "strings"

    "github.com/dmachard/go-dnscollector/dnsutils"
    "github.com/dmachard/go-dnscollector/pkgconfig"
@@ -138,10 +139,18 @@ func (t *GeoIPTransform) geoipTransform(dm *dnsutils.DNSMessage) (int, error) {
            dm.Geo = &dnsutils.TransformDNSGeo{CountryIsoCode: "-", City: "-", Continent: "-", AutonomousSystemNumber: "-", AutonomousSystemOrg: "-"}
    }

geoInfo, err := t.Lookup(dm.NetworkInfo.QueryIP)
if err != nil {
return ReturnKeep, err
}
BestIP := dm.NetworkInfo.QueryIP
if len(dm.EDNS.Options) > 0 {
tmpString := strings.Split(dm.EDNS.Options[0].Data,"/")[0]
if len(tmpString) > 8 {
BestIP = tmpString
}
}
geoInfo, err := t.Lookup(BestIP)
if err != nil {
return ReturnKeep, err

}

dm.Geo.Continent = geoInfo.Continent
dm.Geo.CountryIsoCode = geoInfo.CountryISOCode

The machine has 32GB ram, 16 CPUs, no swap, 40GB/s ethernet link

dmachard commented 2 months ago

Thank you for this type of feedback, it helps improve the tool.

Here are a few points to investigate:

I don't think it's a good idea to set such a high value for chan-buffer-size: 1,000,000. Could you remove this setting and use the global default of 4096 instead?
Could you also remove the transforms and check if the memory behavior remains the same? Your configuration seems unusual because relabeling only works with flat-json output. It shouldn't work in your case. Can you confirm this?
Could you run a Go profiling (goprof)?

PenelopeFudd commented 2 months ago

Ok, will test.

PenelopeFudd commented 2 months ago

Er, how does one run a Go profiling?

PenelopeFudd commented 2 months ago

It looks like making those changes has sharply curtailed the memory usage; it's now about 125-148MB, although it goes up and down.

PenelopeFudd commented 2 months ago

At the moment, each dnsdist frontend is taking 5.7-6.6% of CPU, while each dnscollector backend is taking 3.0-3.7%. With 10 of each, that adds up to a lot. Filebeat is currently taking 1.2GB of memory and 3.3% of CPU.

Is there a way to increase the number of dnscollector threads, so I don't need to run ten copies? Alternatively, is there a way to split a dnstap stream into multiple dnscollectors with some kind of load balancer?

PenelopeFudd commented 2 months ago

Running 10 dnsdists and dnscollectors uses 3.89GB (12%), so setting buffers to 4096 and commenting out the transforms really helped. Running 5 dnsdists and dnscollectors uses 3.32GB (11%), so not much change there.

Having only 5 dnsdists and dnscollectors changed the CPU numbers a bit:

dnsdist 12-15% each (a little more than double the 5.7-6.6% numbers)
dnscollector 4.3-5.7% each (a little less than double the 3.0-3.7% numbers) So dnsdist did worse and dnscollector did better.

Since the results are so close, I'll have to check the logs for discards and other warnings to see what to do next.

Thanks!

PenelopeFudd commented 2 months ago

Oh, there are definitely errors:

$ journalctl -S '20 seconds ago' -u dnscollector@backend1 -o json | jq -r .MESSAGE

ERROR: 2024/09/15 22:58:02.848611 worker - [tap] dnstap - worker[dnstap-processor] buffer is full, 243841 dnsmessage(s) dropped
ERROR: 2024/09/15 22:58:03.830434 worker - [tap] (conn #1) dnstap processor - worker[filebeat] buffer is full, 108452 dnsmessage(s) dropped
ERROR: 2024/09/15 22:58:12.854694 worker - [tap] dnstap - worker[dnstap-processor] buffer is full, 282473 dnsmessage(s) dropped
ERROR: 2024/09/15 22:58:13.832142 worker - [tap] (conn #1) dnstap processor - worker[filebeat] buffer is full, 96872 dnsmessage(s) dropped

PenelopeFudd commented 2 months ago

Partial output of nethogs:

     PID   USER   Program                  Device  Sent      Received
1338604  _dnsdi..  /usr/bin/dnsdist          lo  8323.941  0.000  KB/sec
1338605  _dnsdi..  /usr/bin/dnsdist          lo  8273.483  0.000  KB/sec
1338602  _dnsdi..  /usr/bin/dnsdist          lo  8028.614  0.000  KB/sec
1338603  _dnsdi..  /usr/bin/dnsdist          lo  7926.192  0.000  KB/sec
1338571  _dnsdi..  /usr/bin/dnsdist          lo  7519.153  0.000  KB/sec
1338606  _dnsdi..  /usr/bin/dnsdist          lo  6912.262  0.000  KB/sec                                                                     
1338607  _dnsdi..  /usr/bin/dnsdist          lo  6701.061  0.000  KB/sec
1338559  _dnsdi..  /usr/bin/dnsdist          lo  6670.163  0.000  KB/sec
1338567  _dnsdi..  /usr/bin/dnsdist          lo  5766.493  0.000  KB/sec
1338443  dnscol..  /usr/bin/go-dnscollector  lo  10.648    0.000  KB/sec
1338420  dnscol..  /usr/bin/go-dnscollector  lo  9.384     0.000  KB/sec
1338415  dnscol..  /usr/bin/go-dnscollector  lo  9.371     0.000  KB/sec
1338425  dnscol..  /usr/bin/go-dnscollector  lo  9.309     0.000  KB/sec
1338410  dnscol..  /usr/bin/go-dnscollector  lo  8.985     0.000  KB/sec
1338426  dnscol..  /usr/bin/go-dnscollector  lo  8.985     0.000  KB/sec
1338409  dnscol..  /usr/bin/go-dnscollector  lo  8.366     0.000  KB/sec
1338465  dnscol..  /usr/bin/go-dnscollector  lo  8.126     0.000  KB/sec                                                                     
1338464  dnscol..  /usr/bin/go-dnscollector  lo  8.031     0.000  KB/sec
1338447  dnscol..  /usr/bin/go-dnscollector  lo  7.515     0.000  KB/sec

Given that dnsdist is sending 8000KB/sec of dnstap logs to dnscollector, and dnscollector is sending 10KB/sec of logs to filebeat, I'm thinking that filebeat is not accepting the logs fast enough, by a factor of about 800. And if dnstap is binary and the logs sent to filebeat are json, that factor is even bigger on a per-packet basis.

dmachard commented 2 months ago

The JSON format can be resource-intensive. You can review a benchmark. As you can see, the basic text inline format is the most efficient in terms of CPU usage.

Is there a way to increase the number of dnscollector threads, so I don't need to run ten copies?

Goroutines are used for each incoming connection, so you can use a single binary and listen on multiple ports simultaneously, effectively handling multiple streams within the same instance.

Alternatively, is there a way to split a dnstap stream into multiple dnscollectors with some kind of load balancer?

Currently, no. You can use the DNStap profixier to duplicate the data flow, but it does not support splitting streams across multiple collectors. Feel free to open a feature request to track this enhancement.

dmachard / go-dnscollector

Memory leak in go-dnscollector #810