dmachard / go-dnscollector

Ingesting, pipelining, and enhancing your DNS logs with usage indicators, security analysis, and additional metadata.
MIT License
207 stars 46 forks source link

Problem with dnstap logger - forwarded data isn't recorded in Loki since version 0.30.0 #283

Closed christianbrideau closed 1 year ago

christianbrideau commented 1 year ago

This issue seems to have started with version 0.30.0.

Set up

Things we tried

Starting with both collectors at version 0.30.0

How to reproduce the problem

Upgrade the version of go-dns-collector on the DNS server from a version =< 0.29.0 to version >= 0.30. This is the go-dns-collector which has a DNStap logger that forwards data to a central go-dns-collector instance.

How to override the problem

Run a version 0.29.0 or earlier. Unfortunately, some of our DNS servers actually require this patch: (Added handling for EINTR on syscall.Recvmsg #279). At the moment, we backported the patch to version 0.27.0 and are running our own custom binary.

Conclusions

After running a bunch of tests, it appears the information required to properly send information to Loki is lost when a DNStap logger is involved. We can capture traffic going to Loki, but we cannot find the data in Loki. We suspect labels might not getting set correctly or the newer version requires a config change we didn't see.

Configuration files

config.bind.server.yml

global:
  # If turned on, log some applications messages
  trace:
    # debug informations
    verbose: True

multiplexer:
  collectors:
    - name: sniff
      afpacket-sniffer:
        port: 53
        device:
        capture-dns-queries: true
        capture-dns-replies: true
        cache-support: true
        query-timeout: 5

  loggers:
    - name: console
      stdout:
        mode: text
    - name: dnstap
      dnstap:
        remote-address: <ELB_ADDRESS_FOR_K8S_CLUSTER>
        remote-port: 6000
        sock-path: null
        retry-interval: 5
        tls-support: false
        tls-insecure: false
  routes:
    - from: [ sniff ]
      to: [ dnstap ]

config.central.server.yml

global:
  trace:
    verbose: true
multiplexer:
  collectors:
  - dnstap:
      listen-ip: 0.0.0.0
      listen-port: 6000
    name: tap
  loggers:
  - lokiclient:
      batch-size: 1048576
      flush-interval: 10
      job-name: dnscollector
      mode: json
      retry-interval: 10
      server-url: http://loki-dnslogs-write:3100/loki/api/v1/push
      tenant-id: dnslogs
    name: loki
  routes:
  - from:
    - tap
    to:
    - loki
dmachard commented 1 year ago

Thanks to sharing this issue and use case. I will try to reproduce in my side

dmachard commented 1 year ago

A regression has been introduced in the v0.30.0 in the sniffer with the add of the new feature "IP defrag and TCP reassembly support" => The timestamp is outdated so ignored by loki.

A new beta release has been generated https://github.com/dmachard/go-dns-collector/releases/tag/v0.31.0-beta4 , can you test-it ?

By the way since 0.31.0, a new buffer has been introduced in the dnstap logger to avoid memory leak in some cases, the buffer can be configured (flush-interval and buffer-size )

    - name: dnstap
      dnstap:
        remote-address: 192.168.1.210
        remote-port: 6000
        retry-interval: 5
        flush-interval: 5
        buffer-size: 100

and the new config of the sniffer (with transformers)

multiplexer:
  collectors:
    - name: sniff
      afpacket-sniffer:
        port: 53
        device:
      transforms:
        latency:
          measure-latency: true
          queries-timeout: 2

I will add more tests to avoid this type of regression in the future

christianbrideau commented 1 year ago

Just tested 0.31.0-beta4 and data is now getting to Loki as it used to with versions =< 0.29.0, thanks!

You said that the issue has been introduced in the sniffer and not in the DNStap logger as I previously thought. I've run tests using a sniffer collector and a Loki logger and I could swear the data was reaching its destination. I'm not overruling the possibility of experimenting error though. I can run some more tests if you want.

dmachard commented 1 year ago

Thank for the quick test. The issue occurs only with sniffer collector+dnstap logger , a specific function ToDnstap is used by the dnstap logger to send messages - witch is not the case with the Loki logger.

christianbrideau commented 1 year ago

I get it, thanks for the clarification.

dmachard commented 1 year ago

The release v0.31.0 is out, enjoy.