Closed christianbrideau closed 1 year ago
Thanks to sharing this issue and use case. I will try to reproduce in my side
A regression has been introduced in the v0.30.0 in the sniffer with the add of the new feature "IP defrag and TCP reassembly support" => The timestamp is outdated so ignored by loki.
A new beta release has been generated https://github.com/dmachard/go-dns-collector/releases/tag/v0.31.0-beta4 , can you test-it ?
By the way since 0.31.0, a new buffer has been introduced in the dnstap logger to avoid memory leak in some cases, the buffer can be configured (flush-interval and buffer-size )
- name: dnstap
dnstap:
remote-address: 192.168.1.210
remote-port: 6000
retry-interval: 5
flush-interval: 5
buffer-size: 100
and the new config of the sniffer (with transformers)
multiplexer:
collectors:
- name: sniff
afpacket-sniffer:
port: 53
device:
transforms:
latency:
measure-latency: true
queries-timeout: 2
I will add more tests to avoid this type of regression in the future
Just tested 0.31.0-beta4 and data is now getting to Loki as it used to with versions =< 0.29.0, thanks!
You said that the issue has been introduced in the sniffer and not in the DNStap logger as I previously thought. I've run tests using a sniffer collector and a Loki logger and I could swear the data was reaching its destination. I'm not overruling the possibility of experimenting error though. I can run some more tests if you want.
Thank for the quick test. The issue occurs only with sniffer collector+dnstap logger , a specific function ToDnstap is used by the dnstap logger to send messages - witch is not the case with the Loki logger.
I get it, thanks for the clarification.
The release v0.31.0 is out, enjoy.
This issue seems to have started with version 0.30.0.
Set up
go-dns-collector running on bind servers. Collector is af_packet and logger is DNStap. The data is being forwarded to the collector described in the next line.
go-dns-collector set up as a k8s deployment. Collector is DNStap and Logger is Loki. It receives the data data from the instances described in the previous line, and forwards it to a Loki k8s deployment set up in the same cluster.
Problem description
The story starts when both go-dns-collectors were at version 0.27.0 and it all worked. Information came from the DNS server and we could find it in Loki.
We upgraded the dns collector in the k8s deployment to 0.30.0. Things still worked.
We tried to upgrade the dns collector to version 0.30.0 on the DNS servers, and we could not find the data in Loki anymore.
Things we tried
Starting with both collectors at version 0.30.0
We did packet captures on both places where the dns collector in installed. Between the two collectors, we could see traffic on port 6000, and between the central collector and Loki we could see traffic going to Loki on port 80.
In both places, we added the console logger and the file logger to the configuration. We could see that data was being collected in both places.
We then tried every versions from 0.27.0 to 0.30.0 on the DNS server side. Things were fine until 0.29.0 and stopped working with version 0.30.0.
We downgraded the collector deployed in k8s back to 0.27.0 and tried all versions again on the DNS server side. The problem continued to appear starting with version 0.30.0.
We configured go-dns-collector on the DNS server with a Loki logger, bypassing the central instance. We tried all versions from 0.27.0 to 0.30.0 and we could see the data in Loki for all versions.
How to reproduce the problem
Upgrade the version of go-dns-collector on the DNS server from a version =< 0.29.0 to version >= 0.30. This is the go-dns-collector which has a DNStap logger that forwards data to a central go-dns-collector instance.
How to override the problem
Run a version 0.29.0 or earlier. Unfortunately, some of our DNS servers actually require this patch: (Added handling for EINTR on syscall.Recvmsg #279). At the moment, we backported the patch to version 0.27.0 and are running our own custom binary.
Conclusions
After running a bunch of tests, it appears the information required to properly send information to Loki is lost when a DNStap logger is involved. We can capture traffic going to Loki, but we cannot find the data in Loki. We suspect labels might not getting set correctly or the newer version requires a config change we didn't see.
Configuration files
config.bind.server.yml
config.central.server.yml