Control-D-Inc / ctrld

A highly configurable, multi-protocol DNS forwarding proxy
MIT License
401 stars 19 forks source link

Lost Internet connection after upgrading to v. 1.3.1 #100

Closed pmcarrion closed 9 months ago

pmcarrion commented 11 months ago

My Internet connection gets blocked from time to time after upgrading to version 1.3.1.

The only way to get an Internet connection again is either by: 1) removing the DNS server 127.0.0.1 from the network interface 2) running the ctrld installer 3) restarting the computer

I'm using ctrld on a MacBook Pro (M1 Max) running macOS 13.6 (22G120).

yegors commented 11 months ago

There is not a lot to go by here, please provide exact steps to reproduce and perform the troubleshooting guide to collect additional details: https://github.com/Control-D-Inc/ctrld/wiki/Troubleshooting-Guide

pmcarrion commented 11 months ago

Here's the troubleshooting results:

ps aux | grep ctrld

root             34954   0.0  0.1 410173552  53296   ??  Ss    6:45AM   1:27.45 /usr/local/bin/ctrld run --cd $DID --iface=auto --homedir=/etc/controld --config=/etc/controld/ctrld.toml
pmcarrion        15609   0.0  0.0 408626896   1232 s000  R+    6:21PM   0:00.00 grep --color=auto --exclude-dir=.bzr --exclude-dir=CVS --exclude-dir=.git --exclude-dir=.hg --exclude-dir=.svn --exclude-dir=.idea --exclude-dir=.tox ctrld

netstat isn’t available on macOS.

sudo lsof -iTCP -sTCP:LISTEN | grep ctrld ctrld 34954 root 22u IPv6 0xd27d92a828b698e9 0t0 TCP *:domain (LISTEN)

dig verify.controld.com @127.0.0.1 -p5354

; <<>> DiG 9.10.6 <<>> verify.controld.com @127.0.0.1 -p5354
;; global options: +cmd
;; connection timed out; no servers could be reached

cat /etc/resolv.conf

#
# macOS Notice
#
# This file is not consulted for DNS hostname resolution, address
# resolution, or the DNS query routing mechanism used by most
# processes on this system.
#
# To view the DNS configuration used by this system, use:
#   scutil --dns
#
# SEE ALSO
#   dns-sd(1), scutil(8)
#
# This file is automatically generated.
#
nameserver 127.0.0.1

sudo cat /etc/controld/ctrld.toml

Password:
# AUTO-GENERATED VIA CD FLAG - DO NOT MODIFY

[listener]
  [listener.0]
    ip = '0.0.0.0'
    port = 53

    [listener.0.policy]
      name = 'My Policy'
      rules = [
        {        'captive.apple.com' = []},
        {        'aircanadawifi.com' = []},
        {        'gogoinflight.com' = []},
        {        'southwestwifi.com' = []},
        {        'singaporeair-krisworld.com' = []},
        {        'airborne.gogoinflight.com' = []},
        {        'aainflight.com' = []},
        {        'aa.viasat.com' = []},
        {        'deltawifi.com' = []},
        {        'unitedwifi.com' = []},
        {        'shop.ba.com' = []},
        {        'alaskawifi.com' = []},
        {        'flyfi.com' = []},
        {        'wifi.airasia.com' = []},
        {        'wifi.sncf' = []},
        {        'wifi.tgv-lyria.com' = []},
        {        'freewlan.sbb.ch' = []},
        {        'register.onboard.eurostar.com' = []},
        {        'thalysnet.com' = []},
        {        'iceportal.de' = []},
        {        'vvm.mstore.msg.t-mobile.com' = []},
        {        'wifi.inflightinternet.com' = []},
        {        'ip.videotron.ca' = []},
        {        'wifi.united.com' = []}
      ]

[network]
  [network.0]
    name = 'Network 0'
    cidrs = ['0.0.0.0/0']

[upstream]
  [upstream.0]
    type = 'doh'
    endpoint = 'https://dns.controld.com/$DID
    timeout = 5000

sudo ctrld restart

Password:
Oct 17 20:14:06.000 NTC Service restarted
cuonglm commented 11 months ago

Sounds like something causes ctrld could not start.

Could you please run this command from your terminal, and see what's the output:

/usr/local/bin/ctrld run --cd $DID --iface=auto --homedir=/etc/controld --config=/etc/controld/ctrld.toml
pmcarrion commented 11 months ago

/usr/local/bin/ctrld run --cd $DID --iface=auto --homedir=/etc/controld --config=/etc/controld/ctrld.toml

Oct 18 06:41:19.000 WRN listener.0 could not listen on address: 0.0.0.0:53, trying: 0.0.0.0:53
Oct 18 06:41:19.000 WRN listener.0 could not listen on address: 0.0.0.0:53, trying localhost: 127.0.0.1:53
Oct 18 06:41:19.000 WRN listener.0 could not listen on address: 127.0.0.1:53, trying current ip with port 5354
Oct 18 06:41:19.000 FTL failed to write config file error="open /etc/controld/ctrld.toml: permission denied"

sudo /usr/local/bin/ctrld run --cd $DID --iface=auto --homedir=/etc/controld --config=/etc/controld/ctrld.toml

Password:
Oct 18 06:42:32.000 WRN listener.0 could not listen on address: 0.0.0.0:53, trying: 0.0.0.0:53
Oct 18 06:42:32.000 WRN listener.0 could not listen on address: 0.0.0.0:53, trying localhost: 127.0.0.1:53
Oct 18 06:42:32.000 WRN listener.0 could not listen on address: 127.0.0.1:53, trying current ip with port 5354

I waited for 5 minutes until I restarted the computer.

It seems ctrld is fetching a faulty config from the Control D service. My computer was automatically configured to use 127.0.0.1:53 as the DNS server, but the first listener.0 address is 0.0.0.0:53

cuonglm commented 11 months ago

@pmcarrion Nope, the run command pick another port because it seems there's another process which is running on port 53 on your computer.

How did you upgrade ctrld? Did you use the installer?

pmcarrion commented 11 months ago

How did you upgrade ctrld? Did you use the installer?

Yes. sudo sh -c 'sh -c "$(curl -sL https://api.controld.dev/dl)" -s $DID forced'

cuonglm commented 11 months ago

How did you upgrade ctrld? Did you use the installer?

Yes. sudo sh -c 'sh -c "$(curl -sL https://api.controld.dev/dl)" -s $DID forced'

Please use .com domain instead of .dev.

If the problem still occur, please check give me the content of /usr/local/var/log/ctrld*.log files.

pmcarrion commented 11 months ago

I upgraded with the .com URL. No issues so far. The issues appear some hours after running the installer.

I tried to access the log files but got this error: The folder "log" can't be opened because you don't have permission to see its contents. That folder was last modified on 8 Feb 2023.

cuonglm commented 11 months ago

I upgraded with the .com URL. No issues so far. The issues appear some hours after running the installer.

I tried to access the log files but got this error: The folder "log" can't be opened because you don't have permission to see its contents. That folder was last modified on 8 Feb 2023.

It's weird, something happens on your computer then.

Please do this:

sudo ctrld start --cd=$DID -vv --log=/path/to/log/file

Please set /path/to/log/file to a proper location where you can read the file. When problem occurred, please check the file or sending me.

Thank you.

pmcarrion commented 11 months ago

sudo ctrld start --cd=$DID -vv --log=/Users/USER/ControlD-logs

Oct 18 08:39:33.000 INF loading config file from: /etc/controld/ctrld.toml
Oct 18 08:39:33.000 ERR could not backup old log file: rename /Users/USER/ControlD-logs /Users/USER/ControlD-logs.1: file exists
Oct 18 08:39:33.000 NTC Starting service
Oct 18 08:39:33.000 DBG waiting for ctrld listener to be ready

The new log folder can't be accessed either:

The document "ControlD-logs" could not be opened. You don't have permission.
To view or change permissions, select the item in the Finder and choose File > Get Info.

I had to uninstall ctrld as I couldn’t get connected.

sudo ctrld uninstall

Oct 18 08:44:01.000 ERR could not backup old log file: rename /Users/USER/ControlD-logs /Users/USER/ControlD-logs.1: file exists
Oct 18 08:44:01.000 DBG found network service name for interface network_service="USB 10/100/1000 LAN (Dock)"
Oct 18 08:44:01.000 DBG Restoring DNS for interface iface=en7
Oct 18 08:44:01.000 DBG Restoring DNS successfully iface=en7
Oct 18 08:44:01.000 NTC Service uninstalled

sudo ctrld start --cd=$DID -vv --log=/Users/USER/ControlD-logs

Oct 18 08:45:55.000 INF loading config file from: /etc/controld/ctrld.toml
Oct 18 08:45:55.000 NTC Starting service
Oct 18 08:45:55.000 DBG waiting for ctrld listener to be ready

I had to uninstall ctrld again as I was unable to get connected.

sudo ctrld uninstall

Oct 18 08:47:56.000 ERR could not backup old log file: rename /Users/USER/ControlD-logs /Users/USER/ControlD-logs.1: file exists
Oct 18 08:47:56.000 DBG found network service name for interface network_service="USB 10/100/1000 LAN (Dock)"
Oct 18 08:47:56.000 DBG Restoring DNS for interface iface=en7
Oct 18 08:47:56.000 DBG Restoring DNS successfully iface=en7
Oct 18 08:47:56.000 NTC Service uninstalled
cuonglm commented 11 months ago

Please set /path/to/log/file to a proper location where you can read the file.

Sounds like you haven't chosen a location that you can read the file? The log clearly state that you don't have permission, that causes ctrld could not start.

pmcarrion commented 11 months ago

Sounds like you haven't chosen a location that you can read the file? The log clearly state that you don't have permission, that causes ctrld could not start.

I did change the file path: sudo ctrld start --cd=$DID -vv --log=/Users/USER/ControlD-logs

It seems ctrld changes the folder's permissions, making them inaccessible.

cuonglm commented 11 months ago

Sounds like you haven't chosen a location that you can read the file? The log clearly state that you don't have permission, that causes ctrld could not start.

I did change the file path: sudo ctrld start --cd=$DID -vv --log=/Users/USER/ControlD-logs

It seems ctrld changes the folder's permissions, making them inaccessible.

You should set it to a file, not a folder.

pmcarrion commented 11 months ago

It's a folder.

sudo ctrld start --cd=$DID -vv --log=/Users/USER/ControlD-logs/

Oct 18 11:10:27.000 INF loading config file from: /etc/controld/ctrld.toml
Oct 18 11:10:27.000 NTC Starting service
Oct 18 11:10:27.000 DBG waiting for ctrld listener to be ready
Oct 18 11:10:27.000 ERR ctrld service may not have started due to an error or misconfiguration, service log:
Oct 18 11:10:27.000 ??? ================================
Oct 18 11:10:27.000 ??? ================================
Oct 18 11:10:27.000 DBG found network service name for interface network_service="USB 10/100/1000 LAN (Dock)"
Oct 18 11:10:27.000 DBG Restoring DNS for interface iface=en7
Oct 18 11:10:27.000 DBG Restoring DNS successfully iface=en7
Oct 18 11:10:27.000 NTC Service uninstalled
cuonglm commented 11 months ago

It's a folder.

sudo ctrld start --cd=$DID -vv --log=/Users/USER/ControlD-logs/

Oct 18 11:10:27.000 INF loading config file from: /etc/controld/ctrld.toml
Oct 18 11:10:27.000 NTC Starting service
Oct 18 11:10:27.000 DBG waiting for ctrld listener to be ready
Oct 18 11:10:27.000 ERR ctrld service may not have started due to an error or misconfiguration, service log:
Oct 18 11:10:27.000 ??? ================================
Oct 18 11:10:27.000 ??? ================================
Oct 18 11:10:27.000 DBG found network service name for interface network_service="USB 10/100/1000 LAN (Dock)"
Oct 18 11:10:27.000 DBG Restoring DNS for interface iface=en7
Oct 18 11:10:27.000 DBG Restoring DNS successfully iface=en7
Oct 18 11:10:27.000 NTC Service uninstalled

Please set it to a file, something like: /Users/USER/ControlD-logs/ctrld.log

pmcarrion commented 11 months ago

sudo ctrld start --cd=$DID -vv --log=/Users/USER/ControlD-logs/ctrld.log

Password:
Oct 18 14:19:41.000 INF loading config file from: /etc/controld/ctrld.toml
Oct 18 14:19:41.000 NTC Starting service
Oct 18 14:19:41.000 DBG waiting for ctrld listener to be ready

There appears to be another process using the 127.0.0.1:53

pmcarrion commented 11 months ago

@cuonglm I just emailed you the logs.

cuonglm commented 11 months ago

@cuonglm I just emailed you the logs.

@pmcarrion Thanks, I get those files, and don't see any obvious errors. But sounds like ctrld was killed with no reason, because the log file end immediately after ctrld served requests successfully.

How long does this happen after ctrld start?

Could you please use https://support.apple.com/en-qa/guide/activity-monitor/welcome/mac to see whether ctrld memory increasing?

Also it's great if you can check system log following this guide: https://www.howtogeek.com/356942/how-to-view-the-system-log-on-a-mac/

pmcarrion commented 11 months ago

@cuonglm I sent you another email with several logs and console messages.

cuonglm commented 11 months ago

@cuonglm I sent you another email with several logs and console messages.

I checked my mailbox but nothing received yet. Probably you want to put all files in a folder, then zip the folder before attaching to email.

pmcarrion commented 11 months ago

Email sent with zip file.

cuonglm commented 11 months ago

Email sent with zip file.

Sorry, I haven't received anything yet. Could you please re-send to cuong@windscribe.com. and cc cuong.manhle.vn@gmail.com?

Thank you.

pmcarrion commented 11 months ago

I just resent the email to both addresses.

cuonglm commented 11 months ago

I just resent the email to both addresses.

Unfortunately, I don't see any email in both inboxes.

pmcarrion commented 11 months ago

That's weird. I'll send it on Discord.

furiseto commented 11 months ago

I also have similar problem like this. It's a bit different as when I turn on/wake up my main windows PC, the whole network with 20 other devices gets almost no internet connection for a period of 5-10 minutes and then it kicks back online by itself. During that period I even have difficulty accessing router's GUI. It was like the router gets lagging.

I installed 1.31 Ctrld using SSH to my Asus-Merlin router with 1 mesh node. I have tried to uninstall and re-install and still the same problem. I'm not tech-savy so I dont know how to provide logs but when I check Asus Router's log, it usually has this line: "Maximum number of concurrent DNS queries reached (max: 150)".

One more thing is when I turn on ipv6 in the router, the Controld web gui would report thousands of clients with many unknown ipv6 clients. So I had to turn off ipv6.

pmcarrion commented 11 months ago

@furiseto Looks like you’re having the issues I reported a few days ago:

cuonglm commented 11 months ago

I installed 1.31 Ctrld using SSH to my Asus-Merlin router with 1 mesh node. I have tried to uninstall and re-install and still the same problem. I'm not tech-savy so I dont know how to provide logs but when I check Asus Router's log, it usually has this line: "Maximum number of concurrent DNS queries reached (max: 150)".

What's your Merlin router model and version?

furiseto commented 11 months ago

@cuonglm ZenWiFi XT8 / RT-AX95Q using third-party Merlin fork. Page said the fork has support from RMerlin: https://github.com/gnuton/asuswrt-merlin.ng

And before I installed Ctrld, the router was running stable and rarely had problem. I has used Controld for months with legacy method. It is just only 3 days ago that I switched to ssh installing Ctrld.

cuonglm commented 11 months ago

And before I installed Ctrld, the router was running stable and rarely had problem. I has used Controld for months with legacy method. It is just only 3 days ago that I switched to ssh installing Ctrld.

Do you mean you run ctrld v1.3.0 before?

If yes, could you please reinstall v1.3.0 and see if problem fixed.

Thank you.

furiseto commented 11 months ago

Oh no, legacy method means I set DNS for ipv4 and ipv6 in Router own's WAN setting same as Controld's DNS. The basic method that can be done to any router. So to answer your question, no, v1.31 is the first time I tried ssh installing.

Ssh installing ctrld gives me DOH protocol and benefit to check every clients in the network. I just don't know why every time my windows PC get boot up, the whole thing get shut out for 10 minutes. It's just so weird.

Tạm thời tôi phải uninstall rồi vì bị cả gia đình la quá trời T_T

cuonglm commented 11 months ago

That's weird. I'll send it on Discord.

Thanks, I got those files.

I think it's probably some guarding check interval in v1.3.1 is too short, so kernel blocks ctrld per https://developer.apple.com/forums/thread/124180

I have to investigate more. In the mean time, could you please confirm that switching back to v1.3.0 fixes the problem. Thank you.

pmcarrion commented 11 months ago

I have to investigate more. In the mean time, could you please confirm that switching back to v1.3.0 fixes the problem. Thank you.

Sure, is there a way to downgrade using the installer?

cuonglm commented 11 months ago

I have to investigate more. In the mean time, could you please confirm that switching back to v1.3.0 fixes the problem. Thank you.

Sure, is there a way to downgrade using the installer?

I think no, you have to download the correct binary for your architecture here: https://github.com/Control-D-Inc/ctrld/releases/tag/v1.3.0

Please ensure to uninstall current version before overriding the binary, otherwise, the file may not be updated correctly.

pmcarrion commented 11 months ago

What's your Merlin router model and version?

My router is RT-AX88U running Asuswrt-Merlin 3004.388.4_0.

It's currently reporting 1,692 clients to the Control D Dashboard.

image

The number of devices should be around 60.

image
cuonglm commented 11 months ago

That's weird. I'll send it on Discord.

Oh, another problem is that there're 2 instances of ctrld running on your Mac, that's weird. Could you please kill all of them before installing contorld v1.3.1, and see if the problem still occurred?

pmcarrion commented 11 months ago

Oh, another problem is that there're 2 instances of ctrld running on your Mac, that's weird. Could you please kill all of them before installing contorld v1.3.1, and see if the problem still occurred?

I killed both processes, but when relaunching ctrld, a 2nd ctrld process would relaunch automatically. This only happens when DBG waiting for ctrld listener to be ready occurs.

This issue hasn’t reapeared since the last ctrld (1.3.1) installation, as mentioned in the message I sent by email/Discord.

cuonglm commented 11 months ago

Oh, another problem is that there're 2 instances of ctrld running on your Mac, that's weird. Could you please kill all of them before installing contorld v1.3.1, and see if the problem still occurred?

I killed both processes, but when relaunching ctrld, a 2nd ctrld process would relaunch automatically. This only happens when DBG waiting for ctrld listener to be ready occurs.

This issue hasn’t reapeared since the last ctrld (1.3.1) installation, as mentioned in the message I sent by email/Discord.

I'm confused.

Does the problem still occur after those steps?

pmcarrion commented 11 months ago

The 2nd ctrld process should be killed immediately after installation done. There should only 1 ctrld process run after the installation.

This only happens if the installation is successful.

If you get the error DBG waiting for ctrld listener to be ready when running the installer, there will be 2 ctrld processes.

Are you able to reproduce the problem with v1.3.1 fresh installation following this steps: Uninstall ctrld. Ensure no ctrld process running, if there is, kill it using Activity Monitor. Install v1.3.1 using the installer or binary downloaded from Github release page.

As you can see in the video sent via Discord, there are always 2 processes of ctrld when running the installer. As mentioned above, the second process is killed automatically if the installation is successful.

cuonglm commented 9 months ago

This is fixed in v1.3.2 release.