Control-D-Inc / ctrld

A highly configurable, multi-protocol DNS forwarding proxy
MIT License
406 stars 19 forks source link

Policy config on EdgeOS (Edgerouter X) #134

Closed jbialy closed 8 months ago

jbialy commented 8 months ago

Hey Folks, thanks for creating this great piece of software! 😄

I've been trying to get ctrld working on an Edgerouter X (running firmware 2.0.9-hotfix.7). All 2.x.x releases use Debian 9 (Stretch).

I have a pretty basic config which I would expect to work, however, it seems like the policy rules are not being considered at all, and all traffic is being sent to upstream.0. Here's the config that I've been using:

[service]
  log_level = "info"
  log_path = "/var/log/ctrld.log"

[listener]
  [listener.0]
    ip = '0.0.0.0'
    port = 5354

    [listener.0.policy]
      name = 'Main Policy'
      networks = [
        {'network.0' = ['upstream.0']},
        {'network.1' = ['upstream.0']}
      ]
      macs = [
        {'b0:be:83:2d:c6:50' = ['upstream.1']}
      ]

[network]
  [network.0]
    name = 'LAN'
    cidrs = ['192.168.1.0/25']

  [network.1]
    name = 'LAN-IoT'
    cidrs = ['10.10.10.0/27']

[upstream]
  [upstream.0]
    name = 'Cloudflare'
    type = 'dot'
    endpoint = '1.1.1.1:853'
    timeout = 5000
  [upstream.1]
    name = 'SmartDNS Proxy MTL'
    type = 'legacy'
    endpoint = '169.54.78.85'
    timeout = 5000

Looking at the config, I would expect DNS requests from both networks to go out upstream.0 and for the specified mac address to go out on upstream.1. However, when testing this from the machine with the mac address specified, and looking at the ctrld logs, there are never any requests hitting upstream.1.

Here's are snippets from the logs:

$ head /var/log/ctrld.log
{"level":"info","time":"2024-01-19T16:30:13Z.757","message":"starting ctrld v1.3.3"}
{"level":"info","time":"2024-01-19T16:30:13Z.767","message":"os: linux 4.14.54-UBNT"}
{"level":"info","bootstrap_ip":"1.1.1.1","time":"2024-01-19T16:30:14Z.584","message":"using bootstrap IP for upstream.0"}
{"level":"info","bootstrap_ip":"169.54.78.85","time":"2024-01-19T16:30:14Z.585","message":"using bootstrap IP for upstream.1"}
{"level":"warn","time":"2024-01-19T16:30:14Z.627","message":"no default route IP found"}
{"level":"info","time":"2024-01-19T16:30:14Z.629","message":"starting DNS server on listener.0: 0.0.0.0:5354"}
{"level":"info","time":"2024-01-19T16:30:21Z.229","message":"[fbf13a] QUERY: 127.0.0.1:39005 (gatekeepr) -> listener.0: A detectportal.firefox.com"}
{"level":"info","time":"2024-01-19T16:30:21Z.234","message":"[f84262] QUERY: 127.0.0.1:48816 (gatekeepr) -> listener.0: PTR lb._dns-sd._udp.0.1.168.192.in-addr.arpa"}
{"level":"info","time":"2024-01-19T16:30:21Z.748","message":"[d907b4] QUERY: 127.0.0.1:48816 (gatekeepr) -> listener.0: PTR lb._dns-sd._udp.0.1.168.192.in-addr.arpa"}
{"level":"info","time":"2024-01-19T16:30:21Z.908","message":"[f84262] REPLY: upstream.0 -> 127.0.0.1:48816 (gatekeepr): NXDOMAIN"}

After making a request from machine with mac addrs b0:be:83:2d:c6:50, I don't see any hits on upstream.1:

cat /var/log/ctrld.log | grep upstream.1
{"level":"info","bootstrap_ip":"169.54.78.85","time":"2024-01-19T16:30:14Z.585","message":"using bootstrap IP for upstream.1"}

Instead of using the macs policy block, I also tried to use networks: [] instead, and use the source IP of the machine, i.e. 192.168.1.44/32 but the results are no different. It seems that the policy evaluation doesn't happen at all?

ctrld is obviously working and routing all the traffic to the default upstream.0 however, so this is very strange. I've even checked to make sure that /etc/dnsmasq.d/dnsmasq-zzz-ctrld.conf is being created when ctrld is active, and everything seems to check out fine!

Hoping for a bit of help here, has anyone else seen this when testing on their device? Is it the config? Am I missing something? I can provide more detailed logs if that helps!

Thanks and much appreciated!

jbialy commented 8 months ago

Just wanted to add that the mac address was obtained from the client list available through ctrld clients list so discovery is working correctly!

cuonglm commented 8 months ago

@jbialy How did you make the request to the router? Did you send request to router port 53 (dnsmasq) or you send requests directly to router port 5354 (ctrld)?

jbialy commented 8 months ago

The request is coming from a wifi connected machine (on the 192.168.1.0/25 subnet) which uses the Edgerouter as it's default gateway and also as the DNS resolver, i.e. 192.168.1.1/32.

So, the request would be hitting the router on port 53 (dnsmasq) I would presume, and then get passed to ctrld on 5354?

Side note that I do have two nameservers set as part of DNS forwarding config on the Edgerouter:

show dns forwarding nameservers
-----------------------------------------------
   Nameservers configured for DNS forwarding
-----------------------------------------------
1.1.1.1 available via 'system'
8.8.8.8 available via 'system'
209.197.128.2 available via 'ppp pppoe5'
209.197.128.5 available via 'ppp pppoe5'

However, when ctrld is running, those don't get hit, I've verified this by looking at the Cloudflare status page and with ctrld running, I get Using DNS over TLS (DoT) set to YES, while without ctrld I only see Connected to 1.1.1.1 (legacy mode).

cuonglm commented 8 months ago

@jbialy Could you please run with debug log?

jbialy commented 8 months ago

@cuonglm, I ran a quick test using the debug level. Please see the attached log, ctrld.log.1.gz.

Again, most of the requests in this log are coming from the machine with mac b0:be:83:2d:c6:50, the rest are coming from either network.0 and some likely from network.1.

jbialy commented 8 months ago

The client in question is {"level":"debug","time":"2024-01-19T17:24:45Z.776","message":"found hostname: \"Januszs-MacBook-Air\", ip: \"192.168.1.57\" via mdns"}. This debug log doesn't indicate its mac address but the given client should be matching the mac policy set for 0:be:83:2d:c6:50.

jbialy commented 8 months ago

Is there a way to tell from the debug logs which policies are loaded? If they are at all loaded? As in, that the [listener.0.policy] block is being processed?

yegors commented 8 months ago

Does the ctrld clients list command show you the MAC address in the table? if not, then it wasn't discovered and therefore the policy cannot match. I took your config, modified it slightly like so:

[service]
  log_level = "debug"
  log_path = "/tmp/ctrld.log"

[listener]
  [listener.0]
    ip = '127.0.0.1'
    port = 53

    [listener.0.policy]
      name = 'Main Policy'
      networks = [
        {'network.0' = ['upstream.0']},
        {'network.1' = ['upstream.0']}
      ]
      macs = [
        {'00:0c:29:4a:5c:57' = ['upstream.1']}
      ]

[network]
  [network.0]
    name = 'LAN'
    cidrs = ['192.168.1.0/25']

  [network.1]
    name = 'LAN-IoT'
    cidrs = ['10.10.10.0/27']

[upstream]
  [upstream.0]
    name = 'Cloudflare'
    type = 'dot'
    endpoint = '1.1.1.1:853'
    timeout = 5000
  [upstream.1]
    name = 'Control D'
    type = 'legacy'
    endpoint = '76.76.2.2'
    timeout = 5000

client list table looks like this

test@test-virtual-machine:~$ sudo ctrld clients list
+-------------+----------------------+-------------------+------------+
|     IP      |       Hostname       |        Mac        | Discovered |
+-------------+----------------------+-------------------+------------+
| 10.0.10.1   |                      | 00:50:56:9f:0e:84 | arp        |
| 10.0.10.222 | test-virtual-machine | 00:0c:29:4a:5c:57 | dhcp       |
| 10.0.10.238 |                      | 74:56:3c:44:eb:5e | arp        |
| 127.0.0.1   | test-virtual-machine | 00:0c:29:4a:5c:57 | dhcp       |
+-------------+----------------------+-------------------+------------+

Notice the MAC address for localhost matches what's in the policy

{"level":"info","time":"2024-01-19T15:48:24-05:00.300","message":"[db1794] QUERY: 10.0.10.222:33166 (test-virtual-machine) -> listener.0: A verify.controld.com"}
{"level":"debug","time":"2024-01-19T15:48:24-05:00.300","message":"[db1794] Main Policy, 00:0c:29:4a:5c:57, no rule -> [upstream.1]"}
{"level":"debug","time":"2024-01-19T15:48:24-05:00.300","message":"[db1794] sending query to upstream.1: Control D"}
{"level":"info","time":"2024-01-19T15:48:24-05:00.301","message":"[db1794] REPLY: upstream.1 -> 10.0.10.222:33166 (test-virtual-machine): NOERROR"}
{"level":"debug","time":"2024-01-19T15:48:24-05:00.301","message":"[db1794] received response of 71 bytes in 1.813484ms"}

Notice there is a match for 00:0c:29:4a:5c:57

jbialy commented 8 months ago

@yegors, thanks for testing this out and providing the config for me to try. What's interesting is that in your logs, I'm seeing Main Policy, 00:0c:29:4a:5c:57, no rule -> [upstream.1]"} and later sending query to upstream.1: Control D". That's explicit and clear that the policy filtering is working.

I've taken the config you posted and adjusted only the port number 53 -> 5354 so that dnsmasq can forward to ctrld. I can definitely see the client:

$ ctrld clients list | grep Janusz
| 192.168.1.57              | Januszs-Air                          | b0:be:83:2d:c6:50 | arp,dhcp,mdns |

However, I'm still not seeing the policy being applied on b0:be:83:2d:c6:50, i.e. no explicit policy matched, using default routing -> [upstream.0]. I have no idea why except that it has something to do with the way the DNS query requests get passed to ctrld on EdgeOS. The dnsmasq auto generated conf file looks like this:

$ cat /etc/dnsmasq.d/dnsmasq-zzz-ctrld.conf 
# GENERATED BY ctrld - DO NOT MODIFY
no-resolv
server=127.0.0.1#5354
max-cache-ttl=0

However, cross referencing this against what's in ConfigContentTmpl I'm wondering if I'm missing the add-mac and add-subnet=32,128 directives?

It looks to me that these two directives are set based on the condition determined in UpstreamSendClientInfo(). Where I figure that control-d must be set as one of the upstreams and be of type DOH/DOH3, or that ctrld is set to use NextDNS?

This is something that I should be able to easily verify.

jbialy commented 8 months ago

Ah, indeed that was the culprit. Having one of the upstreams set to be control-d and use DoH3 has enabled the correct dnsmasq configuration.

...
[upstream]
  [upstream.0]
    name = 'Cloudflare'
    type = 'dot'
    endpoint = '1.1.1.1:853'
    timeout = 5000
  [upstream.1]
    name = 'Control D OISD'
    type = 'doh3'
    endpoint = 'https://freedns.controld.com/x-oisd'
    timeout = 5000
...
cat /etc/dnsmasq.d/dnsmasq-zzz-ctrld.conf 
# GENERATED BY ctrld - DO NOT MODIFY
no-resolv
server=127.0.0.1#5354
add-mac
add-subnet=32,128
max-cache-ttl=0

And policy filtering is now working:

"level":"info","time":"2024-01-20T01:59:19Z.547","message":"[eab9ac] QUERY: 192.168.1.57:47144 (Januszs-Air) -> listener.0: A az764295.vo.msecnd.net"}
{"level":"debug","time":"2024-01-20T01:59:19Z.548","message":"[eab9ac] Main Policy, b0:be:83:2d:c6:50, no rule -> [upstream.1]"}
{"level":"debug","time":"2024-01-20T01:59:19Z.549","message":"[eab9ac] including client info with the request"}
{"level":"debug","time":"2024-01-20T01:59:19Z.550","message":"[eab9ac] sending query to upstream.1: Control D OISD"}
yegors commented 8 months ago

Thanks for diving in, this appears to be a bug as that config should totally contain those 2 params. Will investigate further.

cuonglm commented 8 months ago

@jbialy You have to set send_client_info = true for non ControlD/nextdns upstream: https://github.com/Control-D-Inc/ctrld/blob/main/docs/config.md#send_client_info

jbialy commented 8 months ago

Thanks @cuonglm, I think setting send_client_info = true makes sense for when needing to pass the mac information to the upstream but it shouldn't be required for enabling the two configuration directives in dnsmasq-zzz-ctrld.conf (i.e. add-mac and add-subnet=32,128).

cuonglm commented 8 months ago

Thanks @cuonglm, I think setting send_client_info = true makes sense for when needing to pass the mac information to the upstream but it shouldn't be required for enabling the two configuration directives in dnsmasq-zzz-ctrld.conf (i.e. add-mac and add-subnet=32,128).

Yes, we are going to fix this in next release.

jbialy commented 8 months ago

Thank you both @yegors and @cuonglm for the help on this!

trollybusman commented 8 months ago

Yea was going to say. Nothing in the docs for send_client_info suggests the current behavior. Local policy should logically work regardless of this setting.

yegors commented 8 months ago

This issue is resolved in v1.3.4 - the above flag is no longer required.