dmachard / go-dnscollector

Ingesting, pipelining, and enhancing your DNS logs with usage indicators, security analysis, and additional metadata.
MIT License
184 stars 43 forks source link

Can't use dnstap file with bind/named ? #747

Closed judzk closed 1 month ago

judzk commented 1 month ago

Describe the bug Hi, i trying to use your collector. We have in bind9 DNS, but no matter what i do, i haven't succeded a forwarded log to loki/prometheus. Now i'm trying to use dnstap log file

To Reproduce bind9 conf . named.conf.options

       dnstap {auth; client; resolver; forwarder;};
       dnstap-output file "/var/log/bind/dnstap.log" versions 2;

config.yaml

pipelines:
  - name: file-dnstap
    file-ingestor:
      watch-dir: /var/log/bind
      watch-mode: dnstap
    routing-policy:
      forward: [ console,prom,loki ]

  - name: console
    stdout:
      mode: text

  - name: prom
    prometheus:
      listen-ip: 0.0.0.0
      listen-port: 8081
      basic-auth-enable: false
      basic-auth-login: admin
      basic-auth-pwd: changeme
      tls-support: false
      tls-mutual: false
      tls-min-version: 1.2
      cert-file: ""
      key-file: ""
      prometheus-prefix: "dnscollector"
      top-n: 10
      chan-buffer-size: 0
      histogram-metrics-enabled: false
      requesters-metrics-enabled: true
      domains-metrics-enabled: true
      noerror-metrics-enabled: true
      servfail-metrics-enabled: true
      nonexistent-metrics-enabled: true
      timeout-metrics-enabled: true
      prometheus-labels: ["stream_id"]
      requesters-cache-size: 250000
      requesters-cache-ttl: 3600
      domains-cache-size: 500000
      domains-cache-ttl: 3600
      noerror-domains-cache-size: 100000
      noerror-domains-cache-ttl: 3600
      servfail-domains-cache-size: 10000
      servfail-domains-cache-ttl: 3600
      nonexistent-domains-cache-size: 10000
      nonexistent-domains-cache-ttl: 3600
      default-domains-cache-size: 1000
      default-domains-cache-ttl: 3600

  - name: loki
    lokiclient:
      server-url: "http://xxxxxxxxxxxx:3100/api/prom/push"
      job-name: "dnscollector"
      mode: "text"
      flush-interval: 5
      batch-size: 1048576
      retry-interval: 10
      text-format: ""
      proxy-url: ""
      tls-insecure: false
      tls-min-version: 1.2
      ca-file: ""
      cert-file: ""
      key-file: ""
      basic-auth-login: ""
      basic-auth-pwd: ""
      basic-auth-pwd-file: ""
      tenant-id: ""
      relabel-configs: []
      chan-buffer-size: 0

Expected behavior Forward dnstap log file to loki, prometheus, console

Additional context

dmachard commented 1 month ago

You need to fix the extension of your log file dnstap-output file "/var/log/bind/dnstap.fstrm";. The fstrm extension is only recognized by the DNScollector.

Additionally, you need to rotate/truncate and reopen your dnstap file using usr/sbin/rndc dnstap -reopen and logrotate. Here is a suggestion on how to set it up:

touch /etc/logrotate.d/bind

/var/log/bind/dnstap.fstrm {
    daily
    rotate 7
    missingok
    notifempty
    create 0640 named named
    sharedscripts
    olddir /var/log/bind/old_logs
    postrotate
        /usr/sbin/rndc dnstap -reopen > /dev/null 2>&1 || true
    mv /var/log/bind/old_logs/dnstap.fstrm* /var/log/bind/old_logs/dnstap.new.fstrm
    endscript
}

and the DNScollector

  - name: file-dnstap
    file-ingestor:
      watch-dir: /var/log/bind/old_logs
      watch-mode: dnstap
    routing-policy:
      forward: [ console ]

Output example

INFO: 2024/06/23 08:11:32.695673 worker - [file-dnstap] fileingestor - processing dnstap file [dnstap.new.fstrm]
INFO: 2024/06/23 08:11:32.695702 worker - [file-dnstap] fileingestor - processing of [dnstap.new.fstrm] terminated
2024-06-23T06:07:42.936171793Z dbc2cadb574f CLIENT_QUERY NOERROR 172.17.0.1 44744 IPv4 UDP 58b www.apple.com A 0.000000
2024-06-23T06:07:42.937171803Z dbc2cadb574f RESOLVER_QUERY NOERROR 0.0.0.0 48233 IPv4 UDP 66b e6858.dscx.akamaiedge.net A 0.000000
2024-06-23T06:07:42.949171934Z dbc2cadb574f CLIENT_RESPONSE NOERROR 172.17.0.1 44744 IPv4 UDP 226b www.apple.com A 0.000000
2024-06-23T06:07:42.948171923Z dbc2cadb574f RESOLVER_RESPONSE NOERROR 0.0.0.0 48233 IPv4 UDP 70b e6858.dscx.akamaiedge.net A 0.000000
2024-06-23T06:07:47.237218551Z dbc2cadb574f CLIENT_QUERY NOERROR 172.17.0.1 35514 IPv4 UDP 59b www.google.com A 0.000000
2024-06-23T06:07:47.237218551Z dbc2cadb574f CLIENT_RESPONSE NOERROR 172.17.0.1 35514 IPv4 UDP 87b www.google.com A 0.000000
2024-06-23T06:07:48.474231999Z dbc2cadb574f CLIENT_QUERY NOERROR 172.17.0.1 51505 IPv4 UDP 58b www.apple.com A 0.000000
2024-06-23T06:07:48.474231999Z dbc2cadb574f CLIENT_RESPONSE NOERROR 172.17.0.1 51505 IPv4 UDP 226b www.apple.com A 0.000000
2024-06-23T06:07:49.355241577Z dbc2cadb574f CLIENT_QUERY NOERROR 172.17.0.1 37670 IPv4 UDP 59b www.google.com A 0.000000
2024-06-23T06:07:49.355241577Z dbc2cadb574f CLIENT_RESPONSE NOERROR 172.17.0.1 37670 IPv4 UDP 87b www.google.com A 0.000000

Using the Unix socket for dnstap might be more appropriate

judzk commented 1 month ago

Hi, thanks mate, I will try that. I will get a look to unix socket, but if i recall correctly I must use dnstap_receiver for listening the socket? Which is a pip module ? We try to install the barely minimal on our server

But if i understand correctly this schema : image With my bind DNS the "better" way is to use tail. Is their a simular limitation for extension? My log must have a .log extension? Regards

dmachard commented 1 month ago

The dnstap_receiver is deprecated, please use the DNScollector only. Unix socket is supported.

The best way is to use DNSTap protocol except if your bind is too old.

judzk commented 1 month ago

Ok thanks, I will try that. Your fist solution work great, but the ram managemente explode :/ image

dmachard commented 1 month ago

Can you share your full config ? Depends of your config but the prometheus logger can consume a lot of memory https://github.com/dmachard/go-dnscollector/blob/main/docs/performance.md#memory-usage

judzk commented 1 month ago

sure :

global:
  trace:
    verbose: true
  server-identity: resolv-interne-l1
  text-format-delimiter: " "
  text-format-boundary: "\""
  pid-file: ""
  worker:
    interval-monitor: 10
    buffer-size: 4096
  telemetry:
    enabled: true
    web-path: "/metrics"
    web-listen: ":9165"
    prometheus-prefix: "dnscollector_exporter"
    tls-support: false
    tls-cert-file: ""
    tls-key-file: ""
    client-ca-file: ""
    basic-auth-enable: false
    basic-auth-login: admin
    basic-auth-pwd: changeme

pipelines:
  - name: dnstap-dns-server-1
    file-ingestor:
      watch-dir: /var/log/bind
      watch-mode: dnstap
    routing-policy:
      forward: [ loki,prom,console ]
      dropped: [ ]

  - name: console
    stdout:
      mode: text

  - name: prom
    prometheus:
      listen-ip: 0.0.0.0
      listen-port: 8081
      basic-auth-enable: false
      basic-auth-login: admin
      basic-auth-pwd: changeme
      tls-support: false
      tls-mutual: false
      tls-min-version: 1.2
      cert-file: ""
      key-file: ""
      prometheus-prefix: "dnscollector"
      top-n: 10
      chan-buffer-size: 0
      histogram-metrics-enabled: false
      requesters-metrics-enabled: true
      domains-metrics-enabled: true
      noerror-metrics-enabled: true
      servfail-metrics-enabled: true
      nonexistent-metrics-enabled: true
      timeout-metrics-enabled: true
      prometheus-labels: ["stream_id"]
      requesters-cache-size: 250000
      requesters-cache-ttl: 3600
      domains-cache-size: 500000
      domains-cache-ttl: 3600
      noerror-domains-cache-size: 100000
      noerror-domains-cache-ttl: 3600
      servfail-domains-cache-size: 10000
      servfail-domains-cache-ttl: 3600
      nonexistent-domains-cache-size: 10000
      nonexistent-domains-cache-ttl: 3600
      default-domains-cache-size: 1000
      default-domains-cache-ttl: 3600

  - name: loki
    lokiclient:
      server-url: "http://<%= @host_loki %>:3100/api/prom/push"
      job-name: "dnscollector"
      mode: "text"
      flush-interval: 5
      batch-size: 1048576
      retry-interval: 10
      text-format: ""
      proxy-url: ""
      tls-insecure: false
      tls-min-version: 1.2
      ca-file: ""
      cert-file: ""
      key-file: ""
      basic-auth-login: ""
      basic-auth-pwd: ""
      basic-auth-pwd-file: ""
      tenant-id: ""
      relabel-configs: []
      chan-buffer-size: 0
dmachard commented 1 month ago

Could you try to reduce the following settings ?

requesters-cache-size: 250000
domains-cache-size: 500000
noerror-domains-cache-size: 100000
servfail-domains-cache-size: 10000
nonexistent-domains-cache-size: 10000
judzk commented 1 month ago

Ok, i divided the values by 2. i will try like that for 24h and keep you posted. Thanks for your help

Edit : with /2 value, got not log on grafana . So i removed all the fixed value to used the default one. Will see

judzk commented 1 month ago

Hi, here after almost 36h with the defaults settings : image