PowerDNS / pdns

PowerDNS Authoritative, PowerDNS Recursor, dnsdist
https://www.powerdns.com/
GNU General Public License v2.0
3.72k stars 912 forks source link

dnsdist: add multi-stream dnstap sockets #14861

Open johnhtodd opened 1 week ago

johnhtodd commented 1 week ago

Short description

Adding multiple streams of dnstap data emitted from dnsdist would allow better scaling for dnstap consumers.

Usecase

We're using Vector to consume dnstap streams from dnsdist. There apparently are bottlenecks with single-socket models of dnstap data transmission. This I'm sure would be exacerbated by slower latency on the LAN and larger traffic volumes - the ACK traffic will start to cause pileups with a single socket. Spreading the data load out across many dnstap sockets like pdns-rec does would make sense. There is discussion of this here: https://github.com/vectordotdev/vector/issues/20744 (see comment from james-stevens)

Description

Having dnsdist open multiple simultaneous dnstap sockets to any named endpoint would be useful. The number of sockets could be configurable, or it could be based on threading, or dynamic based on number of messages - no opinion on that. How does pdns-rec do it?

omoerbeek commented 1 week ago

The recursor opens a dnstap or protobuf logging stream per thread. Each thread logs to its associated socket.

rgacogne commented 1 week ago

At the moment DNSdist creates one fstrm_writer per newFrameStreamUnixLogger or newFrameStreamTcpLogger object, so in theory it is already possible to create more than one connection, but it might be quite cumbersome to use them so.

johnhtodd commented 6 days ago

More notes on this: When we had >60kqps flowing through a single TCP session to Vector, I was seeing what I would describe as high fluctuations in bandwidth between the transmitting server and the Vector instance. Meaning: for a few milliseconds, there would be many (20? I don't recall the number) megabits of throughput, which would then drop for a few milliseconds to 5 megabits of throughput, and then jump back to 20. This was observed via packet dumps with tshark. Sorry that I don't have the exact figures here; I was hunting a different problem and that behavior was a "Huh... interesting." moment, but I did not document it. I suspect this is a behavior that is made worse by the single-socket model in use. The same packet dump looking at data coming from pdns-rec (with less than 60kqps, admittedly) was extremely smooth - there were no fluctuations in the packet throughput between Vector and the recursive resolver on the dnstap data stream. Is this the "fault" of dnsdist? Not necessarily, but the single-socket model may make things worse than they need to be.