PowerDNS / pdns

PowerDNS Authoritative, PowerDNS Recursor, dnsdist
https://www.powerdns.com/
GNU General Public License v2.0
3.61k stars 900 forks source link

dnsdist: no more protobuf messages sent with metadata in some conditions #12560

Closed dmachard closed 1 year ago

dmachard commented 1 year ago

Description

I am testing the PR Add support for metadata in protobuf messages from alpha version of dnsdist with the following configuration

setLocal('0.0.0.0:5553')
addDOHLocal('0.0.0.0:5554',   '/etc/dnsdist/conf.d/server.crt',  '/etc/dnsdist/conf.d/server.key', 
            { '/dns-query' },  { keepIncomingHeaders=true })

rl = newRemoteLogger("127.0.0.1:6001")

local mytags = { pool='pool', agent='doh-header:user-agent'   }

addAction(AllRule(),RemoteLogAction(rl, nil, {serverID="dnsdist"}, mytags))
addResponseAction(AllRule(),RemoteLogResponseAction(rl, nil, true, {serverID="dnsdist"}, mytags))
addCacheHitResponseAction(AllRule(), RemoteLogResponseAction(rl, nil, true, {serverID="dnsdist"}, mytags))

newServer({address='1.1.1.1:53', pool='test'})
addAction(AllRule(),PoolAction("test"))

I observe something weird, DOH messages are correctly sent to my remote protobuf logger but no more DNS UDP/TCP messages. If I remove the doh-header key from my tags, all DNS messages are correctly sent

It's the expected behavior ?

Environment

rgacogne commented 1 year ago

Hi Denis, This is completely unexpected and thus clearly a bug! Thank you for testing and reporting that quickly, I'm on it!

rgacogne commented 1 year ago

I have not been able to reproduce so far, with your configuration (addresses/ports changed) I'm correctly getting protobuf messages for UDP and TCP queries (output from the Protocol Buffer logger in contrib/ProtobufLogger.py:

[2023-02-17 10:36:14.249850] Query of size 48: 127.0.0.1:51452 (N/A) -> 127.0.0.1:8053 (UDP) id: 59765 uuid: b'b86a3ea67c114ce0acaa9130ad2cc25b' requestorid: N/A deviceid: N/A devicename: N/A serverid: b'dnsdist' nod: 0
- agent ->
- pool ->
- Question: 1, 1, free.fr.
[2023-02-17 10:36:14.253858] Response of size 52: 127.0.0.1:51452 (N/A) -> 127.0.0.1:8053 (UDP) id: 59765 uuid: b'b86a3ea67c114ce0acaa9130ad2cc25b' requestorid: N/A deviceid: N/A devicename: N/A serverid: b'dnsdist' nod: 0
- agent ->
- pool -> test
- Question: 1, 1, free.fr.
- Query time: 2023-02-17 10:36:14.249392
- Response Code: 0, RRs: 1
     - 1, 1, free.fr., 82598, 212.27.48.10, 0
[2023-02-17 10:37:11.122869] Query of size 43: 127.0.0.1:45304 (N/A) -> 127.0.0.1:10443 (DOH) id: 42 uuid: b'57c8dcd774da4ff48aefc8fbbdc065aa' requestorid: N/A deviceid: N/A devicename: N/A serverid: b'dnsdist' nod: 0
- agent -> PycURL/7.45.2 libcurl/7.87.0 OpenSSL/3.0.8 zlib/1.2.13 brotli/1.0.9 zstd/1.5.2 libidn2/2.3.4 libpsl/0.21.2 (+libidn2/2.3.4) libssh2/1.10.0 nghttp2/1.52.0
- pool ->
- Question: 1, 1, nxd.sigsegv.fr.
[2023-02-17 10:37:11.149720] Response of size 726: 127.0.0.1:45304 (N/A) -> 127.0.0.1:10443 (DOH) id: 42 uuid: b'57c8dcd774da4ff48aefc8fbbdc065aa' requestorid: N/A deviceid: N/A devicename: N/A serverid: b'dnsdist' nod: 0
- agent -> PycURL/7.45.2 libcurl/7.87.0 OpenSSL/3.0.8 zlib/1.2.13 brotli/1.0.9 zstd/1.5.2 libidn2/2.3.4 libpsl/0.21.2 (+libidn2/2.3.4) libssh2/1.10.0 nghttp2/1.52.0
- pool -> test
- Question: 1, 1, nxd.sigsegv.fr.
- Query time: 2023-02-17 10:37:11.122487
- Response Code: 3, RRs: 0
[2023-02-17 10:37:56.523715] Query of size 48: 127.0.0.1:33249 (N/A) -> 127.0.0.1:8053 (TCP) id: 18738 uuid: b'e166f98a019f4369b4941391e4f691bd' requestorid: N/A deviceid: N/A devicename: N/A serverid: b'dnsdist' nod: 0
- agent ->
- pool ->
- Question: 1, 1, free.fr.
[2023-02-17 10:37:56.532446] Response of size 52: 127.0.0.1:33249 (N/A) -> 127.0.0.1:8053 (TCP) id: 18738 uuid: b'e166f98a019f4369b4941391e4f691bd' requestorid: N/A deviceid: N/A devicename: N/A serverid: b'dnsdist' nod: 0
- agent ->
- pool -> test
- Question: 1, 1, free.fr.
- Query time: 2023-02-17 10:37:56.523522
- Response Code: 0, RRs: 1
     - 1, 1, free.fr., 84878, 212.27.48.10, 0
dmachard commented 1 year ago

Thank for the quick response, I am using a different collector: https://github.com/dmachard/go-dns-collector (working fine with dnsdist 1.7.0 by the way). I will make more tests, regarding your feedback something is wrong in my collector with the next release

dmachard commented 1 year ago

more details with dnsdist 1.8.0, my protobuf decoder (in golang) failed with the message: proto: required field PBDNSMessage.Meta.value not set ... That explains why I don't get the DNS message even if I am not sure to properly understand this error right now.

currently I am using the proto file definition from the following repository: https://github.com/PowerDNS/dnsmessage but it's not the same from the master branch, witch one is the good one to use ?

rgacogne commented 1 year ago

That's very interesting, thanks! I'm pretty sure there is an issue with the fact that we are setting an empty value (repeated allows that at the stringVal content level, but required does not at the value level). I'll also check why our dnsmessage.proto are not in sync.

rgacogne commented 1 year ago

I'll also check why our dnsmessage.proto are not in sync.

It looks like one type of events was missing in the dnsmessage repository, https://github.com/PowerDNS/dnsmessage/pull/7 should fix it.

rgacogne commented 1 year ago

That's very interesting, thanks! I'm pretty sure there is an issue with the fact that we are setting an empty value (repeated allows that at the stringVal content level, but required does not at the value level).

That was indeed the problem! I cannot find a way to have the value present if both stringVal and intVal are empty, so I'm settling on making sure that we have at least one stringVal present (albeit empty). Fix added to https://github.com/PowerDNS/pdns/pull/12561, which I expect to be able to merge soon. Thanks a lot for finding this issue so quickly, reporting it and helping to debug it, this is much appreciated! I should really find the time to play with go-dns-collector, it looks awesome and it might be a good addition to our tests :)

dmachard commented 1 year ago

thanks for the fix, i'll wait for the merge and re-test that.

dmachard commented 1 year ago

One more thing, if I add the "pool" key in metadata to get the selected pool, the value is only available on DNS replies and always empty on queries. It's because the protobuf message is sent before the pool selection ?

rgacogne commented 1 year ago

One more thing, if I add the "pool" key in metadata to get the selected pool, the value is only available on DNS replies and always empty on queries. It's because the protobuf message is sent before the pool selection ?

Yes! If you change the order of your rules, and tells PoolAction not to stop the processing so the next rules are actually evaluated, you should get the pool for DNS queries as well:

addAction(AllRule(),PoolAction("test", false))
addAction(AllRule(),RemoteLogAction(rl, nil, {serverID="dnsdist"}, mytags))
addResponseAction(AllRule(),RemoteLogResponseAction(rl, nil, true, {serverID="dnsdist"}, mytags))
addCacheHitResponseAction(AllRule(), RemoteLogResponseAction(rl, nil, true, {serverID="dnsdist"}, mytags))

newServer({address='1.1.1.1:53', pool='test'})
dmachard commented 1 year ago

Thanks, I missed that, it's works fine. This new feature with metadata is really great, nice works.

dmachard commented 1 year ago

thank for the fix, tested with success :)