Closed Gorson84 closed 5 months ago
Hi, did the previous version work for you on the same device? Please provide output from this command:
/data/controld/ctrld start --cd RESOLVER_ID_HERE -vv
Hi,
Yes, I actually monitor the Github page, to keep my versions up to date. I've installed many updates before, all of them flawlessly.
I did found a workaround for the install issue:
I started noticing that during every install attempt, that there was traffic going to controlD for a very brief window of time. So just before the 'test query' can trigger an uninstall event, I CTRL-C the installer script.
By doing so, the uninstall doesn't happen and the service keeps on running.
Here your requested test: (THAT AGAIN BROKE MY CONFIG, AND HAD TO APPLY MY WORKAROUND AGAIN)
Last login: Sat Apr 20 14:49:02 2024 from 10.10.1.32
***@***.***:~# /data/controld/ctrld start --cd REDACTED-vv
Apr 20 19:19:04.000 NTC Reading config: /etc/controld/ctrld.toml
Apr 20 19:19:04.000 INF loading config file from: /etc/controld/ctrld.toml
Apr 20 19:19:04.523 DBG cleaning up router before installing
Apr 20 19:19:04.524 NTC Starting service
Apr 20 19:19:11.668 NTC Generating controld config: /etc/controld/ctrld.toml
Apr 20 19:19:11.699 DBG waiting for ctrld listener to be ready
Apr 20 19:19:17.579 DBG ctrld listener is ready
Apr 20 19:19:17.579 DBG performing self-check
Apr 20 19:19:17.581 DBG internal self-check against "ctrld.test" succeeded
Apr 20 19:19:26.724 DBG self-check: [v1] backoff: 511 msec
Apr 20 19:19:28.237 DBG self-check: [v1] backoff: 1069 msec
Apr 20 19:19:30.308 DBG self-check: [v1] backoff: 1149 msec
Apr 20 19:19:32.459 DBG self-check: [v1] backoff: 1157 msec
Apr 20 19:19:34.618 DBG self-check: [v1] backoff: 1303 msec
Apr 20 19:19:36.923 DBG self-check: [v1] backoff: 2445 msec
Apr 20 19:19:40.370 DBG self-check: [v1] backoff: 1234 msec
Apr 20 19:19:42.606 DBG self-check: [v1] backoff: 1818 msec
Apr 20 19:19:45.427 DBG self-check: [v1] backoff: 1581 msec
Apr 20 19:19:48.010 DBG self-check: [v1] backoff: 2357 msec
Apr 20 19:19:51.369 DBG self-check: [v1] backoff: 2514 msec
Apr 20 19:19:54.886 DBG self-check: [v1] backoff: 4088 msec
Apr 20 19:19:58.978 DBG self-check against "verify.controld.com" failed
Apr 20 19:19:58.978 DBG sending doh request to: 76.76.2.22:443
Apr 20 19:19:59.086 DBG ================================
Apr 20 19:19:59.087 DBG listener address : 0.0.0.0:5354
Apr 20 19:19:59.087 DBG last error : read udp 127.0.0.1:47611->127.0.0.1:5354: i/o timeout
Apr 20 19:19:59.087 ??? ================================
Apr 20 19:19:59.088 ??? An error occurred while performing test query: read udp 127.0.0.1:47611->127.0.0.1:5354: i/o timeout
Apr 20 19:19:59.088 ??? ================================
Apr 20 19:20:05.016 DBG Restoring DNS for interface iface=lo
Apr 20 19:20:10.092 DBG dns: [rc=unknown ret=direct]
Apr 20 19:20:10.092 DBG dns: using "direct" mode
Apr 20 19:20:10.092 DBG Restoring DNS successfully iface=lo
Apr 20 19:20:10.092 DBG Router cleanup
Apr 20 19:20:10.092 NTC Service uninstalled
MY WORKAROUND:
***@***.***:~# /data/controld/ctrld start --cd REDACTED -vv
Apr 20 19:21:15.000 NTC Reading config: /etc/controld/ctrld.toml
Apr 20 19:21:15.000 INF loading config file from: /etc/controld/ctrld.toml
Apr 20 19:21:15.463 DBG cleaning up router before installing
Apr 20 19:21:15.463 NTC Starting service
Apr 20 19:21:16.776 NTC Generating controld config: /etc/controld/ctrld.toml
Apr 20 19:21:16.811 DBG waiting for ctrld listener to be ready
Apr 20 19:21:22.579 DBG ctrld listener is ready
Apr 20 19:21:22.579 DBG performing self-check
Apr 20 19:21:22.582 DBG internal self-check against "ctrld.test" succeeded
Apr 20 19:21:30.309 DBG self-check: [v1] backoff: 562 msec
Apr 20 19:21:31.873 DBG self-check: [v1] backoff: 539 msec
Apr 20 19:21:33.414 DBG self-check: [v1] backoff: 846 msec
Apr 20 19:21:35.261 DBG self-check: [v1] backoff: 608 msec
Apr 20 19:21:36.870 DBG self-check: [v1] backoff: 1270 msec
^C
On Saturday, April 20th, 2024 at 19:13, Yegor S @.***> wrote:
Hi, did the previous version work for you on the same device? Please provide output from this command:
/data/controld/ctrld start --cd RESOLVER_ID_HERE -vv
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
Are you able to resolve queries after cancelling the installer script?
From your output, sounds like the second test query is blocked somehow.
As you can see, on ControlD.com, everything is fine. IPV4 & IPV6 dns traffic is working.
The UDM pro is able to ping the domain: root@UDM-Pro:~# ping verify.controld.com PING verify.controld.com(controld.com (2606:1a40:3::1)) 56 data bytes 64 bytes from controld.com (2606:1a40:3::1): icmp_seq=1 ttl=52 time=25.6 ms 64 bytes from controld.com (2606:1a40:3::1): icmp_seq=2 ttl=52 time=22.1 ms 64 bytes from controld.com (2606:1a40:3::1): icmp_seq=3 ttl=52 time=15.6 ms
root@UDM-Pro:~# nslookup verify.controld.com Server: 127.0.0.1 Address: 127.0.0.1#53
Non-authoritative answer: verify.controld.com canonical name = api.controld.com. Name: api.controld.com Address: 147.185.34.1 Name: api.controld.com Address: 2606:1a40:3::1
@Gorson84 Thanks for confirming.
To be clear, are you able to run the same ctrld start ...
command successfully using v1.3.5?
@Gorson84 Thanks for confirming.
To be clear, are you able to run the same
ctrld start ...
command successfully using v1.3.5?
Yes, it's only the self-test as part of the installer/upgrade process that is causing a failure, leading to an automatic de-install.
Having the ability to add an extra parameter like -NT (No Tests) during the installation, would be helpful. And an additional parameter that easily can install the previous version is extremely useful for user like me, that have very limited command interface knowledge.
@Gorson84 Do your UDM Pro have default settings?
What features it is using?
Pretty basic configuration. Only running the network module. Nothing custom is running on it. It's running release software.
On Tuesday, April 23rd, 2024 at 18:42, Cuong Manh Le @.***> wrote:
@.***(https://github.com/Gorson84) Do your UDM Pro have default settings?
What features it is using?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
We're unable to reproduce the issue on an identical UDM Pro, running the exact same firmware version.
Something about your device appears to be different. Are you able to do a factory reset and try it using a stock install, as we're running?
I'm getting the same on an OpenWRT router (snapshot version, though), FYI.
@NoSync Please provide debugging data as requested earlier in the thread and details about your environment, especially anything that differs from stock install.
@NoSync Please provide debugging data as requested earlier in the thread and details about your environment, especially anything that differs from stock install.
Environment: OpenWRT snapshot version (any snapshot behaves the same, this is r26108-c390c6c709) on a GL-MT6000 router, 6.1.86 kernel.
---------------------
| System Info |
---------------------
OS Type : linux
OS Vendor : openwrt
OS Version : SNAPSHOT
Arch : aarch64
CPU : aarch64_cortex-a53
Free RAM : 688 MB / 988 MB
---------------------
| Install Details |
---------------------
Resolver ID : REDACTED
Binary URL : https://dl.controld.com/linux-arm64/ctrld
Install Path : /usr/sbin
Issue: the error received is the same as OP and it manifests itself the same way with both the install script and the execution of ctrld start, but not with ctrld run. The error appears 100% of the times at boot and 50% of the times afterwards.
Debug info (the first attempt failed, the second worked):
root@flint2:~# ctrld start --cd REDACTED -vv
Apr 29 10:12:18.000 NTC Reading config: /etc/controld/ctrld.toml
Apr 29 10:12:18.000 INF loading config file from: /etc/controld/ctrld.toml
Apr 29 10:12:18.881 DBG cleaning up router before installing
Apr 29 10:12:25.574 NTC Starting service
Apr 29 10:12:25.658 NTC Generating controld config: /etc/controld/ctrld.toml
Apr 29 10:12:25.735 DBG waiting for ctrld listener to be ready
Apr 29 10:12:25.773 DBG ctrld listener is ready
Apr 29 10:12:25.774 DBG performing self-check
Apr 29 10:12:25.776 ??? ================================
Apr 29 10:12:25.776 ??? An error occurred while performing test query: read udp 127.0.0.1:41609->127.0.0.1:5354: read: connection refused
Apr 29 10:12:25.776 ??? ================================
Apr 29 10:12:25.853 DBG Restoring DNS for interface iface=eth1
Apr 29 10:12:26.046 DBG dns: [rc=unknown ret=direct]
Apr 29 10:12:26.047 DBG dns: using "direct" mode
Apr 29 10:12:26.105 DBG Restoring DNS successfully iface=eth1
Apr 29 10:12:26.105 DBG Router cleanup
Apr 29 10:12:26.105 NTC Service uninstalled
root@flint2:~# ctrld start --cd REDACTED -vv
Apr 29 10:14:14.000 NTC Reading config: /etc/controld/ctrld.toml
Apr 29 10:14:14.000 INF loading config file from: /etc/controld/ctrld.toml
Apr 29 10:14:14.259 DBG cleaning up router before installing
Apr 29 10:14:14.260 NTC Starting service
Apr 29 10:14:14.309 NTC Generating controld config: /etc/controld/ctrld.toml
Apr 29 10:14:14.389 DBG waiting for ctrld listener to be ready
Apr 29 10:14:26.912 DBG ctrld listener is ready
Apr 29 10:14:26.912 DBG performing self-check
Apr 29 10:14:26.915 DBG internal self-check against "ctrld.test" succeeded
Apr 29 10:14:26.948 DBG external self-check against "verify.controld.com" succeeded
Apr 29 10:14:26.948 NTC Service started
Apr 29 10:14:26.951 DBG setting DNS for interface iface=eth1
Apr 29 10:14:26.953 DBG dns: [rc=unknown ret=direct]
Apr 29 10:14:26.953 DBG dns: using "direct" mode
Apr 29 10:14:26.954 DBG setting DNS successfully iface=eth1
Interestingly, if I repeatedly run the same command from the shell, it alternately works once and fails once.
It's worth stressing that no errors appear with ctrld run. The issue is only with ctrld start.
@NoSync Could you please run ctrld start
with --log=/path/to/your/log_file
, then provide us the log file via email when problem happens: cuong(at)controld.com
Your problem could be different, it's connection refused instead of io/timeout.
@NoSync Could you please run
ctrld start
with--log=/path/to/your/log_file
, then provide us the log file via email when problem happens: cuong(at)controld.comYour problem could be different, it's connection refused instead of io/timeout.
You're right, I had overlooked that. I sent you the logs.
@NoSync Could you please run
ctrld start
with--log=/path/to/your/log_file
, then provide us the log file via email when problem happens: cuong(at)controld.com Your problem could be different, it's connection refused instead of io/timeout.You're right, I had overlooked that. I sent you the logs.
Thanks, I got the log and looking at it. Sounds like you have another instance of ctrld
running on port 5354 already? Your log indicates that the instance of ctrld that you are starting is running on different port:
{"level":"info","time":"2024-04-29T15:53:18Z.357","message":"starting DNS server on listener.0: 0.0.0.0:24544"}
@NoSync Could you please run
ctrld start
with--log=/path/to/your/log_file
, then provide us the log file via email when problem happens: cuong(at)controld.comYour problem could be different, it's connection refused instead of io/timeout.
You're right, I had overlooked that. I sent you the logs.
Thanks, I got the log and looking at it. Sounds like you have another instance of
ctrld
running on port 5354 already? Your log indicates that the instance of ctrld that you are starting is running on different port:{"level":"info","time":"2024-04-29T15:53:18Z.357","message":"starting DNS server on listener.0: 0.0.0.0:24544"}
I hadn't noticed that when generating the log, but I had a zombie ctrld instance running that I had to manually kill some time later. The output error is the same when ctrld starts on 5354 though.
@Gorson84 Can you provide output of netstat -tupln
before starting ctrld, and after it starts with your control+c hack?
Also, do the same: https://github.com/Control-D-Inc/ctrld/issues/147#issuecomment-2083089566
Try the new dev version, should address the problem. Run: ctrld upgrade dev
Try the new dev version, should address the problem. Run:
ctrld upgrade dev
In my case it keeps on working 50% of the times:
root@flint2:~# ctrld --version
ctrld version dev-7ec7072
root@flint2:~# ctrld start --cd REDACTED -vv
May 11 09:56:35.000 NTC Reading config: /etc/controld/ctrld.toml
May 11 09:56:35.000 INF loading config file from: /etc/controld/ctrld.toml
May 11 09:56:35.988 DBG cleaning up router before installing
May 11 09:56:35.988 NTC Starting service
May 11 09:56:36.072 NTC Generating controld config: /etc/controld/ctrld.toml
May 11 09:56:36.160 DBG waiting for ctrld listener to be ready
May 11 09:56:43.042 DBG ctrld listener is ready
May 11 09:56:43.042 DBG performing self-check
May 11 09:56:43.044 DBG internal self-check against "ctrld.test" succeeded
May 11 09:56:43.075 DBG external self-check against "verify.controld.com" succeeded
May 11 09:56:43.075 NTC Service started
May 11 09:56:43.077 DBG setting DNS for interface iface=eth1
May 11 09:56:43.079 DBG dns: [rc=unknown ret=direct]
May 11 09:56:43.079 DBG dns: using "direct" mode
May 11 09:56:43.081 DBG setting DNS successfully iface=eth1
root@flint2:~# ctrld stop
May 11 09:56:47.913 NTC Service stopped
root@flint2:~# ctrld start --cd REDACTED -vv
May 11 09:56:51.000 NTC Reading config: /etc/controld/ctrld.toml
May 11 09:56:51.000 INF loading config file from: /etc/controld/ctrld.toml
May 11 09:56:51.038 DBG cleaning up router before installing
May 11 09:56:51.038 NTC Starting service
May 11 09:56:51.122 NTC Generating controld config: /etc/controld/ctrld.toml
May 11 09:56:51.204 DBG waiting for ctrld listener to be ready
May 11 09:56:51.246 DBG ctrld listener is ready
May 11 09:56:51.246 DBG performing self-check
May 11 09:56:51.247 ??? ================================
May 11 09:56:51.247 ??? An error occurred while performing test query: read udp 127.0.0.1:38174->127.0.0.1:5354: read: connection refused
May 11 09:56:51.247 ??? ================================
May 11 09:56:51.324 DBG Restoring DNS for interface iface=eth1
May 11 09:56:51.643 DBG dns: [rc=unknown ret=direct]
May 11 09:56:51.643 DBG dns: using "direct" mode
May 11 09:56:51.648 DBG trample: resolv.conf changed from what we expected. did some other program interfere? current contents: "# resolv.conf(5) file generated by ctrld\n# DO NOT EDIT THIS FILE BY HAND -- CHANGES WILL BE OVERWRITTEN\n\nnameserver 127.0.0.1\n"
May 11 09:56:51.683 DBG Restoring DNS successfully iface=eth1
May 11 09:56:51.683 DBG Router cleanup
May 11 09:56:51.683 NTC Service uninstalled
@NoSync Can you try running the start
command with this flag: --skip_self_checks
, can you reproduce the issue? Do it several times.
Also, if possible: disable DNS in dnsmasq (by setting its port to 0 in the GUI). This will disable dnsmasq on port 53. Then edit your config and have ctrld listen on port 53 instead, and try the start command without the above flag. Does it work consistently?
Thanks
@yegors I just ran those tests (7/8 times for each case) and ctrld consistently starts with no issues, both with skip_self_checks and port 5354, and without skip_self_checks and port 53.
@NoSync Did you run the check above with dev version or v1.3.6?
@NoSync Did you run the check above with dev version or v1.3.6?
@cuonglm dev.