Control-D-Inc / ctrld

A highly configurable, multi-protocol DNS forwarding proxy
MIT License
400 stars 19 forks source link

UniFi UXG - Failed To Get Default Route Interface #124

Closed z1ucas closed 6 months ago

z1ucas commented 8 months ago

Hi Guys,

I am seeing the following error when trying to install on a Unifi UXG Pro, is there any clues as to how this can be resolved?

root@UXG-PRO-###-###-01:~# sh -c 'sh -c "$(curl -sSL https://api.controld.com/dl)" -s ##HIDDEN## forced'
---------------------
|    System Info    |
---------------------
OS Type      : linux
OS Vendor    : ubios
OS Version   : 3.1.16.12746
Router Model : UniFi NeXt-Gen Gateway PRO
Arch         : aarch64
CPU          : Cortex-A57
Free RAM     : 445 MB / 1997 MB
---------------------
|  Install Details  |
---------------------
Resolver ID  : ##HIDDEN##
Binary URL   : https://assets.controld.com/ctrld/linux/arm64/ctrld
Install Path : /data/controld
---------------------
 - Starting download
 - Making binary executable
 - Launching /data/controld/ctrld
---------------------
Jan  3 00:50:47.098 NTC Starting service
Jan  3 00:50:48.183 NTC Generating controld config: /etc/controld/ctrld.toml
Jan  3 00:50:49.033 ERR ctrld service may not have started due to an error or misconfiguration, service log:
Jan  3 00:50:49.033 ??? ================================
Jan  3 00:50:49.033 ??? Jan  3 00:50:49.008 FTL failed to get default route interface error="defaultRouteFromNetlink: Dial: setsockopt: protocol not available"
Jan  3 00:50:49.033 ??? ================================
Jan  3 00:50:49.676 FTL failed to get default route interface error="defaultRouteFromNetlink: Dial: setsockopt: protocol not available"
root@UXG-PRO-###-###-01:~#

The device does have 2 WAN interfaces and an additional Backup 5G WAN.

root@UXG-PRO-###-###-01:~# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.0.0.0        192.168.10.XXX  255.255.255.0   UG    1      0        0 br10
10.142.100.0    0.0.0.0         255.255.255.0   U     30     0        0 vti64
65.XXX.XXX.0      0.0.0.0         255.255.252.0   U     0      0        0 eth0  **WAN 1
100.XXX.XXX.0      0.0.0.0         255.192.0.0     U     0      0        0 eth1 **WAN 2
XXX.XXX.XXX.128 0.0.0.0         255.255.255.254 U     0      0        0 gre1    ** BACKUP 5G WAN
XXX.XX.XXX.118  192.168.10.XXX  255.255.255.255 UGH   1      0        0 br10
192.168.0.0     192.168.10.XXX  255.255.255.0   UG    1      0        0 br10
192.168.1.0     192.168.10.XXX  255.255.255.0   UG    1      0        0 br10
192.168.10.0    0.0.0.0         255.255.255.0   U     0      0        0 br10
192.168.13.0    192.168.10.XXX  255.255.255.0   UG    1      0        0 br10
192.168.50.0    192.168.10.XXX  255.255.255.0   UG    1      0        0 br10
192.168.60.0    192.168.10.XXX  255.255.255.0   UG    1      0        0 br10
192.168.70.0    192.168.10.XXX  255.255.255.0   UG    1      0        0 br10
192.168.80.0    192.168.10.XXX  255.255.255.0   UG    1      0        0 br10
192.168.86.0    192.168.10.XXX  255.255.255.0   UG    1      0        0 br10
192.168.90.0    192.168.10.XXX  255.255.255.0   UG    1      0        0 br10
192.168.130.0   192.168.10.XXX  255.255.255.0   UG    1      0        0 br10
192.168.140.0   192.168.10.XXX  255.255.255.0   UG    1      0        0 br10
192.168.150.0   0.0.0.0         255.255.254.0   U     0      0        0 br0
192.168.152.0   0.0.0.0         255.255.255.0   U     0      0        0 br20
root@UXG-PRO-###-###-01:~#
cuonglm commented 8 months ago

@z1ucas Thanks for reporting.

Does /data/unifi exist in your system?

z1ucas commented 8 months ago

@cuonglm No that doesn't exist. Below is the contents of the /data directory :

controld/ dpi-tracer/ udapi-config/ ui-db/ uxgpro-setup/ wifiman/

cuonglm commented 8 months ago

@cuonglm No that doesn't exist. Below is the contents of the /data directory :

controld/ dpi-tracer/ udapi-config/ ui-db/ uxgpro-setup/ wifiman/

Thanks, that explains the failure.

ctrld is checking for /data/unifi to detect it's running on Ubios.

Probably we need a better way for this checking.

z1ucas commented 8 months ago

ctrld is checking for /data/unifi to detect it's running on Ubios.

Probably we need a better way for this checking.

Ok, So I just created the /data/unifi directory and its now installed successfully.

cuonglm commented 8 months ago

@z1ucas is this a fresh installation of UniFi UXG?

yegors commented 8 months ago

@z1ucas What does this command return? ubnt-device-info summary

z1ucas commented 8 months ago

@yegors

Device information summary: Subsystem ID: ea19 Family: UniFi NeXt-Gen Gateway (UXG) Model: UniFi NeXt-Gen Gateway Pro (UXG-Pro) Default MAC address: XX:XX:XX:XX:XX:XX Default IPv4 address: 127.0.0.1 Firmware: 3.1.16 (3.1.16)

yegors commented 8 months ago

Thanks!

@cuonglm ^ thats your anchor

cuonglm commented 8 months ago

Thanks!

@cuonglm ^ thats your anchor

We can't use ubnt-device-info summary in ctrld, because that binary is not available during boot time when ctrld was started by system service manager.

The installer can use it, because the installer is run after boot.

z1ucas commented 8 months ago

@yegors @cuonglm FYI I have a UXG Lite as well and the same issue existed. To get past it in the interim I added the /data/unifi directory :

Device information summary: Subsystem ID: a677 Family: UniFi NeXt-Gen Gateway (UXG) Model: UniFi NeXt-Gen Gateway Lite (UXG) Default MAC address: XX:XX:XX:XX:XX:XX Default IPv4 address: 127.0.0.1 Firmware: 3.2.11 (3.2.11)

cuonglm commented 6 months ago

@z1ucas Could you please try .dev domain and see if the issue was fixed?

Thank you.

cuonglm commented 6 months ago

This should be fixed after #138

z1ucas commented 6 months ago

Apologies, I’ve only just got to this message. I’ll also check it out and install on a UXG device. If there are any problems I’ll reply back.On 5 Mar 2024, at 04:37, Cuong Manh Le @.***> wrote: This should be fixed after #138

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

yegors commented 6 months ago

@z1ucas I assume all good?

z1ucas commented 6 months ago

No, I’m having a few problems on both UXG Lite and I’m having intermittent DNS issues on the UXG Pro. DNS seems to stop responding on both UXG Lite requiring the daemon to be shut down. Upon restarting the daemon it’s instantly unresponsive.I’m trying to pinpoint what is going on. I’ll post back errors that I’m seeing here in about 15 mins.On 9 Mar 2024, at 03:46, Yegor S @.***> wrote: @z1ucas I assume all good?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

yegors commented 6 months ago

That doesn't seem related to the issue thread. If you have other problems, open a new issue and collect debugging data while it's in the broken state: https://github.com/Control-D-Inc/ctrld/wiki/Troubleshooting-Guide

z1ucas commented 6 months ago

That doesn't seem related to the issue thread. If you have other problems, open a new issue and collect debugging data while it's in the broken state: https://github.com/Control-D-Inc/ctrld/wiki/Troubleshooting-Guide

I am pretty sure it is related. I have been trying to work out why DOH, DOH3 and DOQ stopped working on all my PRO and LITE UXG devices when upgrading. The first error that I am seeing is :

Mar 9 09:03:31.988 ERR could not init Ubios discover error="fork/exec /usr/bin/mongo: no such file or directory"

It does look related to my initial query in relation to detecting what the device is.

After that, a range of error messages appear, depending on what you try to do with the config. I should note that it is impossible to install using a config hosted at ControlD, the following error occurs :

Mar  9 01:21:01.968 ??? ================================
Mar  9 01:21:01.969 ??? An error occurred while performing test query: read udp 127.0.0.1:46581->127.0.0.1:5354: i/o timeout
Mar  9 01:21:01.969 ??? ================================
Mar  9 01:21:01.970 ??? ctrld service was running, but a DNS query could not be sent to its listener
Mar  9 01:21:01.970 ??? Please check your system firewall if it is configured to block/intercept/redirect DNS queries
Mar  9 01:21:01.970 ??? ================================
Mar  9 01:21:17.214 NTC Service uninstalled

Once I take the exact same config hosted at ControlD it's possible to start it locally with DOQ, but DOH and DOH3 must be removed from the config or it won't start. Even though it starts with DOQ - no DNS queries are successful.

z1ucas commented 6 months ago

Also, during the install the following is being displayed, which is different than what we've seen in the previous builds. Not sure if this makes a difference. Lite then Pro :

---------------------
|    System Info    |
---------------------
OS Type      : linux
OS Vendor    : ubios
OS Version   : 3.2.15.14839
Router Model : Gateway Lite
Arch         : aarch64
CPU          : Kryo V2
Free RAM     : 178 MB / 974 MB
---------------------
|  Install Details  |
---------------------
Resolver ID  : #REMOVED#
Binary URL   : https://dl.controld.com/linux-arm64/ctrld
Install Path : /data/controld

---------------------
|    System Info    |
---------------------
OS Type      : linux
OS Vendor    : ubios
OS Version   : 3.2.15.14839
Router Model : Gateway Pro
Arch         : aarch64
CPU          : Cortex-A57
Free RAM     : 141 MB / 1997 MB
---------------------
|  Install Details  |
---------------------
Resolver ID  : #REMOVED#
Binary URL   : https://dl.controld.com/linux-arm64/ctrld
Install Path : /data/controld
cuonglm commented 6 months ago

If you created /data/unifi, does v1.3.4 binary work for you?

Could you please run ctrld interactively using ctrld run --cd=<uid> -vv, then sending query directly to ctrld listener.

z1ucas commented 6 months ago

So, I did find the culprit and I’m not sure why it has a range of issues associated with it.

I have UniFi Magic Site active across 3 locations. After the upgrade to either 1.3.4 or 1.3.5 the following line which is referenced from the configuration docs causes Site 2 and Site 3 to stop working (if it was running) or fail to start CtrlD if it wasn’t running / or being installed - when Site 1 has CtrlD running or is commanded to start up. Site 2 and 3 are behind a firewall/router and Site 1 has the Public IP :

An empty upstream would not route the request to any defined upstreams, and use the OS default resolver.

[listener.0.policy]
name = "OS Resolver"

rules = [
    {"*.local" = []},
]

I am not 100% sure what it is doing, but the .local rule is definitely causing a range of different issues.

Site 2 and 3 work perfectly if Site 1 doesn’t have CtrlD running, but as soon as Site 1 tries to start CtrldD with this rule all 3 sites begin failing.

Removing the rule from Site 1 allowed all sites to start and run CtrlD, though I removed it from all 3 sites in any case.