davesteele / comitup

Bootstrap Wifi support over Wifi
https://davesteele.github.io/comitup/
GNU General Public License v2.0
320 stars 54 forks source link

dnsmasq SIGTERM #182

Closed sistemicorp closed 2 years ago

sistemicorp commented 3 years ago

After a reboot, comitup doesn't complete network connection to a known WiFi network.

Initially the WiFi network is unknown and comitup successfully enters hotspot and I can successfully navigate to the setup page and select the network and password and successfully connect to the network.

However, on reboot, although the RPi has connected to the WiFi network, it does not call the external callback. I think this is showing the error,

pi@p1125-f85e:~ $ sudo systemctl status comitup
- comitup.service - Comitup Wifi Management
   Loaded: loaded (/lib/systemd/system/comitup.service; disabled; vendor preset: enabled)
   Active: active (running) since Sat 2021-07-10 18:51:56 EDT; 2min 3s ago
     Docs: man:comitup(8)
 Main PID: 2399 (comitup)
    Tasks: 1 (limit: 3727)
   CGroup: /system.slice/comitup.service
           └─2399 /usr/bin/python3 /usr/sbin/comitup

Jul 10 18:51:55 p1125-f85e systemd[1]: Starting Comitup Wifi Management...
Jul 10 18:51:56 p1125-f85e systemd[1]: Started Comitup Wifi Management.
Jul 10 18:51:56 p1125-f85e dnsmasq[2438]: started, version 2.80 cachesize 150
Jul 10 18:51:56 p1125-f85e dnsmasq[2438]: compile time options: IPv6 GNU-getopt DBus i18n IDN DHCP DHCPv6 no-Lua TFTP conntrack ipset auth DNSSEC loop-detect inotify dumpfile
Jul 10 18:51:56 p1125-f85e dnsmasq-dhcp[2438]: DHCP, IP range 10.41.0.100 -- 10.41.0.150, lease time 10m
Jul 10 18:51:56 p1125-f85e dnsmasq[2438]: reading /etc/resolv.conf
Jul 10 18:51:56 p1125-f85e dnsmasq[2438]: using nameserver 192.168.86.1#53
Jul 10 18:51:56 p1125-f85e dnsmasq[2438]: read /etc/hosts - 4 addresses
Jul 10 18:51:56 p1125-f85e dnsmasq[2438]: exiting on receipt of SIGTERM

dnsmasq is exiting in this way and comitup doesn't call the external callback. Thats my guess.

comitup log:

2021-07-10 18:51:55,885 - comitup - INFO - Starting comitup
2021-07-10 18:51:56,169 - comitup - INFO - Setting state to HOTSPOT
2021-07-10 18:51:56,177 - comitup - INFO - Activating hotspot
2021-07-10 18:51:56,499 - comitup - INFO - Setting state to CONNECTING
2021-07-10 18:51:56,659 - comitup - INFO - Setting state to CONNECTED

I confirmed all the setup instructions from here: https://github.com/davesteele/comitup/wiki/Installing-Comitup

Connecting after a reboot was working, perhaps 1 month ago I tested it... But something has changed to break it and I haven't been able to figure it out... where to look...

pi@p1125-f85e:~ $ sudo cat /etc/hosts
127.0.0.1   p1125-f85e
::1     localhost ip6-localhost ip6-loopback
ff02::1     ip6-allnodes
ff02::2     ip6-allrouters

The problem is occurring on the current version I have installed, AND it also happens if I update to the latest version.

When the WiFi network is not known, and the connection is made the first time, dnsmasq does not SIGTERM.

davesteele commented 3 years ago

Assuming you are not using the Comitup Image, make sure that the dnsmasq service is at least disabled.

https://github.com/davesteele/comitup/blob/d8b170e/comitup/comitup.py#L59

sistemicorp commented 3 years ago

Correct, I am not using the comitup image. I see no log warnings in the comitup log. Yes, I believe dnsmasq.service is disabled... here is what is running...

pi@p1125-f85e:~ $ sudo systemctl --type=service
UNIT                                                        LOAD   ACTIVE SUB     DESCRIPTION                                                       
alsa-restore.service                                        loaded active exited  Save/Restore Sound Card State                                     
alsa-state.service                                          loaded active running Manage Sound Card State (restore and store)                       
avahi-daemon.service                                        loaded active running Avahi mDNS/DNS-SD Stack                                           
bluetooth.service                                           loaded active running Bluetooth service                                                 
bthelper@hci0.service                                       loaded failed failed  Raspberry Pi bluetooth helper                                     
comitup.service                                             loaded active running Comitup Wifi Management                                           
console-setup.service                                       loaded active exited  Set console font and keymap                                       
containerd.service                                          loaded active running containerd container runtime                                      
cron.service                                                loaded active running Regular background program processing daemon                      
cups-browsed.service                                        loaded active running Make remote CUPS printers available locally                       
cups.service                                                loaded active running CUPS Scheduler                                                    
dbus.service                                                loaded active running D-Bus System Message Bus                                          
dhcpcd.service                                              loaded active running dhcpcd on all interfaces                                          
docker.service                                              loaded active running Docker Application Container Engine                               
dphys-swapfile.service                                      loaded active exited  dphys-swapfile - set up, mount/unmount, and delete a swap file    
fake-hwclock.service                                        loaded active exited  Restore / save the current clock                                  
getty@tty1.service                                          loaded active running Getty on tty1                                                     
gldriver-test.service                                       loaded active exited  Check for v3d driver                                              
hciuart.service                                             loaded active running Configure Bluetooth Modems connected by UART                      
ifupdown-pre.service                                        loaded active exited  Helper to synchronize boot up for ifupdown                        
keyboard-setup.service                                      loaded active exited  Set the console keyboard layout                                   
kmod-static-nodes.service                                   loaded active exited  Create list of required static device nodes for the current kernel
lightdm.service                                             loaded active running Light Display Manager                                             
ModemManager.service                                        loaded active running Modem Manager                                                     
networking.service                                          loaded active exited  Raise network interfaces                                          
NetworkManager-wait-online.service                          loaded active exited  Network Manager Wait Online                                       
NetworkManager.service                                      loaded active running Network Manager                                                   
polkit.service                                              loaded active running Authorization Manager                                             
raspi-config.service                                        loaded active exited  LSB: Switch to ondemand cpu governor (unless shift key is pressed)
rc-local.service                                            loaded active running /etc/rc.local Compatibility                                       
rng-tools.service                                           loaded active running rng-tools.service                                                 
rpi-eeprom-update.service                                   loaded active exited  Check for Raspberry Pi EEPROM updates                             
rsyslog.service                                             loaded active running System Logging Service                                            
rtkit-daemon.service                                        loaded active running RealtimeKit Scheduling Policy Service                             
ssh.service                                                 loaded active running OpenBSD Secure Shell server                                       
systemd-fsck-root.service                                   loaded active exited  File System Check on Root Device                                  
systemd-fsck@dev-disk-by\x2dpartuuid-ea7d04d6\x2d01.service loaded active exited  File System Check on /dev/disk/by-partuuid/ea7d04d6-01            
systemd-journal-flush.service                               loaded active exited  Flush Journal to Persistent Storage                               
systemd-journald.service                                    loaded active running Journal Service                                                   
systemd-logind.service                                      loaded active running Login Service                                                     
systemd-modules-load.service                                loaded active exited  Load Kernel Modules                                               
systemd-random-seed.service                                 loaded active exited  Load/Save Random Seed                                             
systemd-remount-fs.service                                  loaded active exited  Remount Root and Kernel File Systems                              
systemd-sysctl.service                                      loaded active exited  Apply Kernel Variables                                            
systemd-sysusers.service                                    loaded active exited  Create System Users                                               
systemd-timesyncd.service                                   loaded active running Network Time Synchronization                                      
systemd-tmpfiles-setup-dev.service                          loaded active exited  Create Static Device Nodes in /dev                                
systemd-tmpfiles-setup.service                              loaded active exited  Create Volatile Files and Directories                             
systemd-udev-trigger.service                                loaded active exited  udev Coldplug all Devices                                         
systemd-udevd.service                                       loaded active running udev Kernel Device Manager                                        
systemd-update-utmp.service                                 loaded active exited  Update UTMP about System Boot/Shutdown                            
systemd-user-sessions.service                               loaded active exited  Permit User Sessions                                              
triggerhappy.service                                        loaded active running triggerhappy global hotkey daemon                                 
udisks2.service                                             loaded active running Disk Manager                                                      
user-runtime-dir@1000.service                               loaded active exited  User Runtime Directory /run/user/1000                             
user@1000.service                                           loaded active running User Manager for UID 1000                                         
wpa_supplicant.service                                      loaded active running WPA supplicant                                                    
xrdp-sesman.service                                         loaded active running xrdp session manager                                              
xrdp.service                                                loaded active running xrdp daemon 

and,

pi@p1125-f85e:~ $ dnsmasq --version
Dnsmasq version 2.80  Copyright (c) 2000-2018 Simon Kelley

in the system log,

Jul 10 18:49:38 p1125-f85e NetworkManager[380]: <info>  [1625957378.8343] dhcp4 (wlan0): activation: beginning transaction (timeout in 45 seconds)
Jul 10 18:49:38 p1125-f85e NetworkManager[380]: <info>  [1625957378.8439] dhcp4 (wlan0): dhclient started with pid 1883
Jul 10 18:49:38 p1125-f85e NetworkManager[380]: <info>  [1625957378.9893] dhcp4 (wlan0):   address 192.168.86.41
Jul 10 18:49:38 p1125-f85e NetworkManager[380]: <info>  [1625957378.9894] dhcp4 (wlan0):   plen 24 (255.255.255.0)
Jul 10 18:49:38 p1125-f85e NetworkManager[380]: <info>  [1625957378.9895] dhcp4 (wlan0):   gateway 192.168.86.1
Jul 10 18:49:38 p1125-f85e NetworkManager[380]: <info>  [1625957378.9896] dhcp4 (wlan0):   lease time 86400
Jul 10 18:49:38 p1125-f85e NetworkManager[380]: <info>  [1625957378.9900] dhcp4 (wlan0):   hostname 'p1125-f85e'
Jul 10 18:49:38 p1125-f85e NetworkManager[380]: <info>  [1625957378.9903] dhcp4 (wlan0):   nameserver '192.168.86.1'
Jul 10 18:49:38 p1125-f85e NetworkManager[380]: <info>  [1625957378.9903] dhcp4 (wlan0):   domain name 'lan'
Jul 10 18:49:38 p1125-f85e NetworkManager[380]: <info>  [1625957378.9904] dhcp4 (wlan0): state changed unknown -> bound
Jul 10 18:49:38 p1125-f85e NetworkManager[380]: <info>  [1625957378.9993] device (wlan0): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed')
Jul 10 18:49:39 p1125-f85e NetworkManager[380]: <info>  [1625957379.0036] device (wlan0): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'managed')
Jul 10 18:49:39 p1125-f85e NetworkManager[380]: <info>  [1625957379.0051] device (wlan0): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed')
Jul 10 18:49:39 p1125-f85e NetworkManager[380]: <info>  [1625957379.0079] manager: NetworkManager state is now CONNECTED_LOCAL
Jul 10 18:49:39 p1125-f85e NetworkManager[380]: <info>  [1625957379.0322] manager: NetworkManager state is now CONNECTED_SITE
Jul 10 18:49:39 p1125-f85e NetworkManager[380]: <info>  [1625957379.0335] policy: set 'sistemi' (wlan0) as default for IPv4 routing and DNS
Jul 10 18:49:39 p1125-f85e NetworkManager[380]: <info>  [1625957379.0349] dns-mgr: Writing DNS information to /sbin/resolvconf
Jul 10 18:49:39 p1125-f85e NetworkManager[380]: <info>  [1625957379.1981] device (wlan0): Activation: successful, device activated.
Jul 10 18:49:39 p1125-f85e NetworkManager[380]: <info>  [1625957379.2019] manager: NetworkManager state is now CONNECTED_GLOBAL
Jul 10 18:49:41 p1125-f85e NetworkManager[380]: <info>  [1625957381.6877] audit: op="radio-control" arg="wireless-enabled" pid=2001 uid=0 result="success"
Jul 10 18:50:27 p1125-f85e NetworkManager[380]: <info>  [1625957427.2545] audit: op="connection-delete" uuid="2d0f7cc8-fd0a-4fff-8ab5-359e98ae8596" name="p1125-f85e-0000" pid=2107 uid=0 result="success"
Jul 10 18:50:27 p1125-f85e NetworkManager[380]: <info>  [1625957427.2789] keyfile: add connection /etc/NetworkManager/system-connections/p1125-f85e-0000.nmconnection (34538449-0359-47a3-89ea-8b8130d8126b,"p1125-f85e-0000")
Jul 10 18:50:27 p1125-f85e NetworkManager[380]: <info>  [1625957427.2811] audit: op="connection-add" uuid="34538449-0359-47a3-89ea-8b8130d8126b" name="p1125-f85e-0000" pid=2107 uid=0 result="success"
Jul 10 18:50:30 p1125-f85e NetworkManager[380]: <info>  [1625957430.8108] audit: op="radio-control" arg="wireless-enabled" pid=2175 uid=0 result="success"
Jul 10 18:51:56 p1125-f85e NetworkManager[380]: <info>  [1625957516.1301] audit: op="connection-delete" uuid="34538449-0359-47a3-89ea-8b8130d8126b" name="p1125-f85e-0000" pid=2399 uid=0 result="success"
Jul 10 18:51:56 p1125-f85e NetworkManager[380]: <info>  [1625957516.1552] keyfile: add connection /etc/NetworkManager/system-connections/p1125-f85e-0000.nmconnection (4a55428e-682e-4991-8375-f855d4076abe,"p1125-f85e-0000")
Jul 10 18:51:56 p1125-f85e NetworkManager[380]: <info>  [1625957516.1578] audit: op="connection-add" uuid="4a55428e-682e-4991-8375-f855d4076abe" name="p1125-f85e-0000" pid=2399 uid=0 result="success"
Jul 10 18:51:59 p1125-f85e NetworkManager[380]: <info>  [1625957519.7086] audit: op="radio-control" arg="wireless-enabled" pid=2467 uid=0 result="success"

I saw that dhcpcd.service is running and I was worried about that. if I disable that service, it doesn't fix the problem, and I lose eth0 function. There is a denyinterface option to dhcpcd.service in its conf file so that it doesn't service wlan0, but doing that didn't change anything either.

I turned on dnsmasq logging, but I get the same as I see in systemctl log,

pi@p1125-f85e:~ $ sudo cat dnsmasq.log 
Jul 10 22:42:22 dnsmasq[1402]: started, version 2.80 cachesize 150
Jul 10 22:42:22 dnsmasq[1402]: compile time options: IPv6 GNU-getopt DBus i18n IDN DHCP DHCPv6 no-Lua TFTP 
conntrack ipset auth DNSSEC loop-detect inotify dumpfile
Jul 10 22:42:22 dnsmasq-dhcp[1402]: DHCP, IP range 10.41.0.100 -- 10.41.0.150, lease time 10m
Jul 10 22:42:22 dnsmasq[1402]: reading /etc/resolv.conf
Jul 10 22:42:22 dnsmasq[1402]: using nameserver 192.168.86.1#53
Jul 10 22:42:22 dnsmasq[1402]: read /etc/hosts - 4 addresses
Jul 10 22:42:23 dnsmasq[1402]: exiting on receipt of SIGTERM

The arguments to dnsmaaq are

/usr/share/comitup/dns/dns-hotspot.conf wlan0

And finally,

pi@p1125-f85e:~ $ sudo cat /usr/share/comitup/dns/dns-hotspot.conf
address=/#/10.41.0.1
auth-ttl=0
no-dhcp-interface=eth0
dhcp-option=160,http://10.41.0.1/
dhcp-option=6,10.41.0.1
dhcp-range=10.41.0.100,10.41.0.150,10m
bogus-priv
domain-needed
dhcp-leasefile=/var/lib/comitup/dhcpleaseinfo
pid-file=/var/run/comitup-dns

The three odd observations, 1) When the WiFi is unknown, "it works" (after I use the hotspot config form and press connect), there is no SIGTERM message from dnsmasq and comitup calls the external callback. 2) On reboot, when it fails, the WiFi is connected to the network... but there is the SIGTERM message and the external callback is not called. 3) If I plug in eth0, which triggers stopping of comitup service (and wifi via nmcli radio wifi off), and then I unplug eth0, which triggers starting comitup "it works" (no SIGTERM, and external callback is called). But just restarting comitup doesn't work. Its like it needs the eth0 connection to happen.

davesteele commented 2 years ago

if I disable that service, it doesn't fix the problem, and I lose eth0 function.

I'm not seeing that problem. My eth0 is managed by NetworkManager.

If I plug in eth0, which triggers stopping of comitup service (and wifi via nmcli radio wifi off), and then I unplug eth0, which triggers starting comitup "it works" (no SIGTERM, and external callback is called).

That sounds like your modifications are involved with the issue.

sistemicorp commented 2 years ago

Indeed the issue was mine. The root problem was my app was missing external callback updates as it was busy doing something else, the callback status updates were lost on my side.

It seems the dnsmasq SIGTERM is an expected, or normal occurrence.