Control-D-Inc / ctrld

A highly configurable, multi-protocol DNS forwarding proxy
MIT License
402 stars 19 forks source link

Analytics are only logged for request on the IPv4 listener #170

Closed MHulst22 closed 2 weeks ago

MHulst22 commented 1 month ago

Just started transitioning my DNS infra from NextDNS to ControlD to test if it's worth making the leap entirely. I'm using two ctrld daemons within LXC containers on Proxmox for my local LAN. Enabled full analytics for every endpoint for now to get a good insight in how stuff works.

I'm experimenting with a few IPv6-only LANs and noticed that no analytics is reported when using the IPv6 listener. Started troubleshooting and found out that when I talk to the IPv4 listener the queries are reported on the analytics page. I know the query itself goes fine via the IPv6 listener. Response is reported back and when looking in the ctrld.log file I see that the request is passed on to the correct listener. Also, when I try the query I perform the same query on the IPv4 listener after I performed it on the IPv6 listener, the ctrld.log file reports a cache hit.

Below the redacted config I used. I'm using different upstream profiles for each network.

`[service] log_level = "debug" # Set the log level (e.g., debug, info, notice, warn, error, fatal, panic) log_path = "/var/log/ctrld.log" # Path to the log file discover_ptr = true # Enables PTR record discovery. cache_enable = true cache_size = 16384 cache_ttl_override = 300 cache_serve_stale = true

[listener] [listener.0] ip = "::" port = 53 restricted = true # Only explicitly defined networks are allowed allow_wan_clients = true # Necessary when using GUAs

[listener.0.policy]
  name = 'Home Policy'
  networks = [
      {'network.0' = ['upstream.0']},
      {'network.1' = ['upstream.1']},
      {'network.2' = ['upstream.2']},
      {'network.3' = ['upstream.3']},
      {'network.4' = ['upstream.4']},
  ]
  rules = [
{"*.168.192.in-addr.arpa" = ["upstream.5"]},
{"*.X.172.in-addr.arpa" = ["upstream.5"]},
{"*.X.172.in-addr.arpa" = ["upstream.5"]},
{"*.168.192.in-addr.arpa" = ["upstream.5"]},
{"*.10.in-addr.arpa" = ["upstream.5"]},
{"*.x.x.x.x.x.x.x.x.2.0.a.2.ip6.arpa" = ["upstream.5"]},
{"*.private.lan" = ["upstream.5"]},

]

[network] [network.0] name = 'network.0' cidrs = ['X.X.X.0/24', '2a02:x:x:x::/64']

[network.1] name = 'network.1' cidrs = ['X.X.X.0/24', '2a02:x:x:x::/64']

[network.2] name = 'network.2' cidrs = ['X.X.X.0/24', '2001:x:x:x::/64']

[network.3] name = 'network.3' cidrs = ['X.X.X.0/24', '2001:x:x:x::/64']

[network.4] name = 'Catch-All / Default for other networks' cidrs = ['10.0.0.0/8', '172.16.0.0/12', '192.168.0.0/24', '2a02:x:x::/48', '2001:x:x:x::/56']

[upstream] [upstream.0] name = 'network.0 DOH3' type = 'doh3' endpoint = 'https://dns.controld.com/abcdef12345' timeout = 5000

[upstream.1] name = 'network.1 DOH3' type = 'doh3' endpoint = 'https://dns.controld.com/abcdef12345' timeout = 5000

[upstream.2] name = 'network.2 DOH3' type = 'doh3' endpoint = 'https://dns.controld.com/abcdef12345' timeout = 5000

[upstream.3] name = 'network.3 DOH3' type = 'doh3' endpoint = 'https://dns.controld.com/abcdef12345' timeout = 5000

[upstream.4] name = 'Catch-All / Default for other networks DOH3' type = 'doh3' endpoint = 'https://dns.controld.com/abcdef12345' timeout = 5000

[upstream.5] name = 'Resolver local domain names' type = 'legacy' endpoint = 'X.X.X.254:53' timeout = 5000`

Also tried making explicit listeners for both protocol stacks with no change in the outcome. Also checked my Opnsense firewall and even created explicit rules. I can see sessions being established to api.controld.com (2606:1a40:3::1) on TCP 443, but still no analytics data for queries on the IPv6 listener. Also fiddling with explicit ip_stack settings did not change the outcome....

cuonglm commented 1 month ago

Do you see this line in your log: https://github.com/Control-D-Inc/ctrld/blob/a00d2a431a8aac84204ef06c95fe4f62748fad5a/doh.go#L158

which indicates what client info is sent to ControlD upstream.

MHulst22 commented 1 month ago

Looks like a log I don't know about (doh.go)?

Below a snippet of ctrld.log with a query that does make it into the Analytics:

{"level":"debug","time":"2024-08-08T22:55:59Z.079","message":"[5d1008] received response of 40 bytes in 794.762 s"} {"level":"info","time":"2024-08-08T22:55:59Z.081","message":"[771e02] QUERY: 172.X.X.X:50712 (HOSTNAME) -> listener.0: A www.google.pl"} {"level":"debug","time":"2024-08-08T22:55:59Z.081","message":"[771e02] Home Policy, network.1, no rule -> [upstream.1]"} {"level":"debug","time":"2024-08-08T22:55:59Z.081","message":"[771e02] including client info with the request"} {"level":"debug","time":"2024-08-08T22:55:59Z.082","message":"[771e02] sending query to upstream.1: Network.1"} {"level":"debug","time":"2024-08-08T22:55:59Z.082","message":"[771e02] sending request header: map[X-Cd-Host:[HOSTNAME] X-Cd-Ip:[172.X.X.X]]"} {"level":"debug","time":"2024-08-08T22:55:59Z.100","message":"[771e02] add cached response"} {"level":"info","time":"2024-08-08T22:55:59Z.100","message":"[771e02] REPLY: upstream.1 -> 172.X.X.X:50712 (HOSTNAME): NOERROR"} {"level":"debug","time":"2024-08-08T22:55:59Z.100","message":"[771e02] received response of 70 bytes in 19.084836ms"} {"level":"info","time":"2024-08-08T22:55:59Z.104","message":"[f4ded1] QUERY: 172.X.X.X:50713 (HOSTNAME) -> listener.0: AAAA www.google.pl"} {"level":"debug","time":"2024-08-08T22:55:59Z.104","message":"[f4ded1] Home Policy, network.1, no rule -> [upstream.1]"} {"level":"debug","time":"2024-08-08T22:55:59Z.104","message":"[f4ded1] including client info with the request"} {"level":"debug","time":"2024-08-08T22:55:59Z.104","message":"[f4ded1] sending query to upstream.1: Network.1"} {"level":"debug","time":"2024-08-08T22:55:59Z.104","message":"[f4ded1] sending request header: map[X-Cd-Host:[HOSTNAME] X-Cd-Ip:[172.X.X.X]]"} {"level":"debug","time":"2024-08-08T22:55:59Z.114","message":"[f4ded1] add cached response"} {"level":"info","time":"2024-08-08T22:55:59Z.114","message":"[f4ded1] REPLY: upstream.1 -> 172.X.X.X:50713 (HOSTNAME): NOERROR"} {"level":"debug","time":"2024-08-08T22:55:59Z.115","message":"[f4ded1] received response of 94 bytes in 11.014784ms"}

MHulst22 commented 1 month ago

Interesting update. When I change the upstream type from doh3 to dot or doq, analytic logs are rolling in as expected. Setting it to doh gives me the same as doh3.

cuonglm commented 1 month ago

Thanks.

Looks like the client info is sent to ControlD correctly.

FYI, doh or doh3 will use the same client info code.

MHulst22 commented 1 month ago

At the moment I've run out of ideas what I can test more to exclude things in my setup, if anybody has any, please let me know. For now I have set all upstream types to DOQ and everything is working as expected.

Still curious what the issue is for this specific situation (CTRLD daemon IPv6 listeners with DOH / DOH3 upstreams)

Again if anyone has any idea on something to test, please let me know.

Thusfar I'm liking the features (especially DNS64 for testing my IPv6-only networks).

cddeppe commented 1 month ago

I have actually noticed the same issue. I.e. when my GoogleTV has an IPv4 address I can see all the queries being made, but once it gets the IPv6 address, eventually I don't see any in the log at all. I know it's using ControlD since I'm re-routing some specific traffic and it still works, I just don't see the queries being made in my logs on the website.

cuonglm commented 1 month ago

@cddeppe Thanks for reporting.

Could you please check ctrld log to see if your client info were actually logged?

cddeppe commented 1 month ago

{"level":"info","time":"2024-08-13T09:03:22-07:00.108","message":"[253a9c] QUERY: [2601:::::::]:58124 (GoogleTV-Eth) -> listener.0: AAAA xxx.xxx.tv"} {"level":"debug","time":"2024-08-13T09:03:22-07:00.109","message":"[253a9c] no explicit policy matched, using default routing -> [upstream.0]"} {"level":"debug","time":"2024-08-13T09:03:22-07:00.109","message":"[253a9c] including client info with the request"} {"level":"debug","time":"2024-08-13T09:03:22-07:00.109","message":"[253a9c] sending query to upstream.0: "} {"level":"debug","time":"2024-08-13T09:03:22-07:00.109","message":"[253a9c] sending request header: map[X-Cd-Host:[GoogleTV-Eth] X-Cd-Ip:[2601:::::::] X-Cd-Mac:[XX:XX:XX:XX:XX:XX]]"} {"level":"info","time":"2024-08-13T09:03:22-07:00.113","message":"[43dbdb] QUERY: [2601:::::::]:44848 (GoogleTV-Eth) -> listener.0: A xxx.xxx.tv"} {"level":"debug","time":"2024-08-13T09:03:22-07:00.113","message":"[43dbdb] no explicit policy matched, using default routing -> [upstream.0]"} {"level":"debug","time":"2024-08-13T09:03:22-07:00.113","message":"[43dbdb] including client info with the request"} {"level":"debug","time":"2024-08-13T09:03:22-07:00.113","message":"[43dbdb] sending query to upstream.0: "} {"level":"debug","time":"2024-08-13T09:03:22-07:00.113","message":"[43dbdb] sending request header: map[X-Cd-Host:[GoogleTV-Eth] X-Cd-Ip:[2601:::::::] X-Cd-Mac:[XX:XX:XX:XX:XX:XX]]"} {"level":"debug","time":"2024-08-13T09:03:22-07:00.156","message":"[253a9c] add cached response"} {"level":"info","time":"2024-08-13T09:03:22-07:00.156","message":"[253a9c] REPLY: upstream.0 -> [22601:::::::]:58124 (GoogleTV-Eth): NOERROR"} {"level":"debug","time":"2024-08-13T09:03:22-07:00.156","message":"[253a9c] received response of 105 bytes in 47.845099ms"} {"level":"debug","time":"2024-08-13T09:03:22-07:00.191","message":"[43dbdb] add cached response"} {"level":"info","time":"2024-08-13T09:03:22-07:00.191","message":"[43dbdb] REPLY: upstream.0 -> [2601:::::::]:44848 (GoogleTV-Eth): NOERROR"} {"level":"debug","time":"2024-08-13T09:03:22-07:00.191","message":"[43dbdb] received response of 81 bytes in 78.355682ms"} {"level":"info","time":"2024-08-13T09:03:22-07:00.918","message":"[f6041c] QUERY: 127.0.0.1:44382 (Cloud-Gateway-Ultra) -> listener.0: A ips1.unifi-ai.com"} {"level":"info","time":"2024-08-13T09:03:22-07:00.918","message":"[72485f] QUERY: 127.0.0.1:39973 (Cloud-Gateway-Ultra) -> listener.0: AAAA ips1.unifi-ai.com"} {"level":"debug","time":"2024-08-13T09:03:22-07:00.919","message":"[72485f] My Policy, network.0, no rule -> [upstream.0 upstream.1]"} {"level":"debug","time":"2024-08-13T09:03:22-07:00.919","message":"[72485f] hit cached response"} {"level":"debug","time":"2024-08-13T09:03:22-07:00.919","message":"[72485f] received response of 160 bytes in 976.341µs"} {"level":"debug","time":"2024-08-13T09:03:22-07:00.919","message":"[f6041c] My Policy, network.0, no rule -> [upstream.0 upstream.1]"} {"level":"debug","time":"2024-08-13T09:03:22-07:00.920","message":"[f6041c] hit cached response"} {"level":"debug","time":"2024-08-13T09:03:22-07:00.921","message":"[f6041c] received response of 90 bytes in 2.662022ms"} {"level":"info","time":"2024-08-13T09:03:26-07:00.393","message":"[7db5d2] QUERY: 192.168.1.6:60886 () -> listener.0: A fw-update.ubnt.com"} {"level":"debug","time":"2024-08-13T09:03:26-07:00.393","message":"[7db5d2] My Policy, network.0, no rule -> [upstream.0 upstream.1]"} {"level":"debug","time":"2024-08-13T09:03:26-07:00.393","message":"[7db5d2] hit cached response"} {"level":"debug","time":"2024-08-13T09:03:26-07:00.393","message":"[7db5d2] received response of 187 bytes in 643.38µs"} {"level":"info","time":"2024-08-13T09:03:26-07:00.397","message":"[8fbfa3] QUERY: 192.168.1.6:56612 () -> listener.0: AAAA fw-update.ubnt.com"} {"level":"debug","time":"2024-08-13T09:03:26-07:00.397","message":"[8fbfa3] My Policy, network.0, no rule -> [upstream.0 upstream.1]"} {"level":"debug","time":"2024-08-13T09:03:26-07:00.397","message":"[8fbfa3] hit cached response"} {"level":"debug","time":"2024-08-13T09:03:26-07:00.397","message":"[8fbfa3] received response of 247 bytes in 571.546µs"} {"level":"info","time":"2024-08-13T09:03:31-07:00.342","message":"[ae2b8a] QUERY: 127.0.0.1:57839 (Cloud-Gateway-Ultra) -> listener.0: A ui.com"} {"level":"debug","time":"2024-08-13T09:03:31-07:00.342","message":"[ae2b8a] My Policy, network.0, no rule -> [upstream.0 upstream.1]"} {"level":"debug","time":"2024-08-13T09:03:31-07:00.343","message":"[ae2b8a] hit cached response"} {"level":"debug","time":"2024-08-13T09:03:31-07:00.343","message":"[ae2b8a] received response of 111 bytes in 549.88µs"} {"level":"info","time":"2024-08-13T09:03:31-07:00.566","message":"[b4ee97] QUERY: [2601:::::::]:51191 (GoogleTV-Eth) -> listener.0: AAAA xxx.xxx.tv"} {"level":"debug","time":"2024-08-13T09:03:31-07:00.566","message":"[b4ee97] no explicit policy matched, using default routing -> [upstream.0]"} {"level":"debug","time":"2024-08-13T09:03:31-07:00.567","message":"[b4ee97] including client info with the request"} {"level":"debug","time":"2024-08-13T09:03:31-07:00.567","message":"[b4ee97] sending query to upstream.0: "} {"level":"debug","time":"2024-08-13T09:03:31-07:00.567","message":"[b4ee97] sending request header: map[X-Cd-Host:[GoogleTV-Eth] X-Cd-Ip:[2601:::::::] X-Cd-Mac:[XX:XX:XX:XX:XX:XX]]"} {"level":"info","time":"2024-08-13T09:03:31-07:00.569","message":"[49da60] QUERY: [2601:::::::]:35374 (GoogleTV-Eth) -> listener.0: A xxx.xxx.tv"} {"level":"debug","time":"2024-08-13T09:03:31-07:00.570","message":"[49da60] no explicit policy matched, using default routing -> [upstream.0]"} {"level":"debug","time":"2024-08-13T09:03:31-07:00.570","message":"[49da60] including client info with the request"} {"level":"debug","time":"2024-08-13T09:03:31-07:00.570","message":"[49da60] sending query to upstream.0: "} {"level":"debug","time":"2024-08-13T09:03:31-07:00.570","message":"[49da60] sending request header: map[X-Cd-Host:[GoogleTV-Eth] X-Cd-Ip:[2601:::::::] X-Cd-Mac:[XX:XX:XX:XX:XX:XX]]"} {"level":"debug","time":"2024-08-13T09:03:31-07:00.650","message":"[b4ee97] add cached response"} {"level":"info","time":"2024-08-13T09:03:31-07:00.650","message":"[b4ee97] REPLY: upstream.0 -> [2601:::::::]:51191 (GoogleTV-Eth): NOERROR"} {"level":"debug","time":"2024-08-13T09:03:31-07:00.650","message":"[b4ee97] received response of 104 bytes in 84.071854ms"} {"level":"debug","time":"2024-08-13T09:03:31-07:00.652","message":"[49da60] add cached response"} {"level":"info","time":"2024-08-13T09:03:31-07:00.652","message":"[49da60] REPLY: upstream.0 -> [2601:::::::]:35374 (GoogleTV-Eth): NOERROR"} {"level":"debug","time":"2024-08-13T09:03:31-07:00.652","message":"[49da60] received response of 80 bytes in 82.661468ms"}

cuonglm commented 1 month ago

Thanks both of you for your reports.

We are investigating problem and will report back once it's resolved.

MHulst22 commented 1 month ago

@cddeppe Are you using DOH or DOH3 upstreams as well in this setup? Does it change when you use other types?

cddeppe commented 1 month ago

I normally use DOH3. but I just tested with DOQ and I see the queries in the activity log just fine that way. Except of course it doesn't include the hostname etc.

alexelisenko commented 1 month ago

Short answer: We've identified the issue and will be pushing out a fix shortly

Detailed answer:

Prior to v1.3.2 the default behaviour of ctrld allowed WAN clients to speak to the open resolver. This resulted in alot of data being ingested by our analytics service when this misconfigured open DNS resolver would be found and used by random clients on the internet.

To mitigate this issue, we opted to drop analytics writes when the source IP and ctrld IP are not in the same /32 for IPv6, or when an IPv4 is a WAN address.

As of v1.3.2, this default behaviour has been corrected, but we have not updated the dropping logic serverside.

I will report back when this fix is fully rolled out.

llorllale commented 2 weeks ago

Hi - fix has been rolled out; please check now and ensure your DNS lookups are reflected in the activity logs.

Thanks.

cddeppe commented 2 weeks ago

It looks like my logs started working sometime yesterday morning for the affected devices.

MHulst22 commented 1 week ago

Configured DOH3 again and logs seem to be working now. Thanks for the follow-up.