bellrichm / WeeWX-MQTTSubscribe

A WeeWX service and driver that receives data from MQTT.
GNU General Public License v3.0
52 stars 13 forks source link

WeeWX crashes because MQTTSubscribe fails network connect on cold start or reboot #177

Closed mbruski closed 1 year ago

mbruski commented 1 year ago

Issue with launching WeeWx (system cold or reboot) with MQTTSubcriber installed as a 'driver'. MQTT is on localhost and sensor data if populated by a remote RPi Zero W gathering data via rtl_433 and locally attached BME-280 sensor. The network is still initializing when WeeWX is launched causing the driver to fail getting a connection to the broker. My guess would be that the broker is waiting on the network and hasn't posted a listen on port 1883 while this is happening. The broker will eventually post a listen after network is available but WeeWX always needs to be launched manually. I experience the problem on two different machines. Is there a way to get WeeWX to defer startup up until after the network and broker are available? If not, maybe MQTTSubscribe can be tweaked to retry a reasonable amount of times for that connection?

System1: Zotac ZBOX-ID80 Intel Atom D2700 CPU @ 2.13GHz 4 GB ram

System2: Dell OptiPlex 9020 USFF Intel i7-4790S CPU @ 3.2GHz 16 GB ram

S/W: Lubuntu 20.04.5 LTS, WeeWX 4.9.1, MQTTSubscribe 2.2.2

<relavent segment from /var/log/syslog> Jan 27 08:42:50 MY_HOSTNAME NetworkManager[689]: [1674826970.3711] device (wlp4s0): driver supports Access Point (AP) mode Jan 27 08:42:50 MY_HOSTNAME NetworkManager[689]: [1674826970.3748] manager: (wlp4s0): new 802.11 Wi-Fi device (/org/freedesktop/NetworkManager/Devices/3) Jan 27 08:42:50 MY_HOSTNAME NetworkManager[689]: [1674826970.3803] device (wlp4s0): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external') Jan 27 08:42:50 MY_HOSTNAME kernel: [ 14.381639] iwlwifi 0000:04:00.0: Radio type=0x2-0x2-0x1 Jan 27 08:42:50 MY_HOSTNAME kernel: [ 14.480852] iwlwifi 0000:04:00.0: Radio type=0x2-0x2-0x1 Jan 27 08:42:50 MY_HOSTNAME weewx[930] INFO main: Initializing weewx version 4.9.1 Jan 27 08:42:50 MY_HOSTNAME weewx[930] INFO main: Using Python 3.8.10 (default, Nov 14 2022, 12:59:47) #012[GCC 9.4.0] Jan 27 08:42:50 MY_HOSTNAME weewx[930] INFO main: Located at /bin/python3 Jan 27 08:42:50 MY_HOSTNAME NetworkManager[689]: [1674826970.5720] Error: failed to open /run/network/ifstate Jan 27 08:42:50 MY_HOSTNAME weewx[930] INFO main: Platform Linux-5.15.0-58-generic-x86_64-with-glibc2.29 Jan 27 08:42:50 MY_HOSTNAME weewx[930] INFO main: Locale is 'en_US.UTF-8' Jan 27 08:42:50 MY_HOSTNAME weewx[930] INFO main: Using configuration file /etc/weewx/weewx.conf Jan 27 08:42:50 MY_HOSTNAME weewx[930] INFO main: Debug is 0 Jan 27 08:42:50 MY_HOSTNAME weewx[930] INFO main: PID file is /var/run/weewx.pid Jan 27 08:42:50 MY_HOSTNAME weewx[944] INFO weewx.engine: Loading station type MQTTSubscribeDriver (user.MQTTSubscribe) Jan 27 08:42:50 MY_HOSTNAME NetworkManager[689]: [1674826970.6278] modem-manager: ModemManager available Jan 27 08:42:50 MY_HOSTNAME NetworkManager[689]: [1674826970.6336] supplicant: wpa_supplicant running Jan 27 08:42:50 MY_HOSTNAME NetworkManager[689]: [1674826970.6339] device (wlp4s0): supplicant interface state: init -> starting Jan 27 08:42:50 MY_HOSTNAME weewx[730]: ...done. Jan 27 08:42:50 MY_HOSTNAME systemd[1]: Started LSB: weewx weather system. Jan 27 08:42:50 MY_HOSTNAME wpa_supplicant[733]: dbus: fill_dict_with_properties dbus_interface=fi.w1.wpa_supplicant1.Interface.P2PDevice dbus_property=P2PDeviceConfig getter failed Jan 27 08:42:50 MY_HOSTNAME NetworkManager[689]: [1674826970.7436] sup-iface[0x55f07bbdf110,wlp4s0]: supports 5 scan SSIDs Jan 27 08:42:50 MY_HOSTNAME NetworkManager[689]: [1674826970.7497] device (wlp4s0): supplicant interface state: starting -> ready Jan 27 08:42:50 MY_HOSTNAME NetworkManager[689]: [1674826970.7500] device (wlp4s0): state change: unavailable -> disconnected (reason 'supplicant-available', sys-iface-state: 'managed') Jan 27 08:42:50 MY_HOSTNAME systemd[1]: Starting NVIDIA Persistence Daemon... Jan 27 08:42:50 MY_HOSTNAME weewx[944] INFO user.MQTTSubscribe: (Driver) Version is 2.2.2 Jan 27 08:42:50 MY_HOSTNAME systemd[1]: Started NVIDIA Persistence Daemon. Jan 27 08:42:50 MY_HOSTNAME nvidia-persistenced: Verbose syslog connection opened Jan 27 08:42:50 MY_HOSTNAME systemd[1]: Stopping NVIDIA Persistence Daemon... Jan 27 08:42:50 MY_HOSTNAME nvidia-persistenced: Now running with user ID 126 and group ID 134 Jan 27 08:42:50 MY_HOSTNAME weewx[944] INFO user.MQTTSubscribe: (Driver) Log level: 0 Jan 27 08:42:50 MY_HOSTNAME nvidia-persistenced: Started (951) Jan 27 08:42:50 MY_HOSTNAME weewx[944] INFO user.MQTTSubscribe: (Driver) Log debug setting: 0 Jan 27 08:42:50 MY_HOSTNAME weewx[944] INFO user.MQTTSubscribe: (Driver) Log console: False Jan 27 08:42:50 MY_HOSTNAME weewx[944] INFO user.MQTTSubscribe: (Driver) Log file: None Jan 27 08:42:50 MY_HOSTNAME weewx[944] INFO user.MQTTSubscribe: (Driver) Max loop interval is: 0 Jan 27 08:42:50 MY_HOSTNAME weewx[944] INFO user.MQTTSubscribe: (Driver) message_callback_provider_name is user.MQTTSubscribe.MessageCallbackProvider Jan 27 08:42:50 MY_HOSTNAME weewx[944] INFO user.MQTTSubscribe: (Driver) clientid is MQTTSubscribe-9254 Jan 27 08:42:50 MY_HOSTNAME weewx[944] INFO user.MQTTSubscribe: (Driver) client_session is True Jan 27 08:42:50 MY_HOSTNAME weewx[944] INFO user.MQTTSubscribe: (Driver) host is localhost Jan 27 08:42:50 MY_HOSTNAME weewx[944] INFO user.MQTTSubscribe: (Driver) port is 1883 Jan 27 08:42:50 MY_HOSTNAME weewx[944] INFO user.MQTTSubscribe: (Driver) keepalive is 60 Jan 27 08:42:50 MY_HOSTNAME weewx[944] INFO user.MQTTSubscribe: (Driver) username is None Jan 27 08:42:50 MY_HOSTNAME weewx[944] INFO user.MQTTSubscribe: (Driver) min_delay is 1 Jan 27 08:42:50 MY_HOSTNAME weewx[944] INFO user.MQTTSubscribe: (Driver) max_delay is 120 Jan 27 08:42:50 MY_HOSTNAME weewx[944] INFO user.MQTTSubscribe: (Driver) password is set Jan 27 08:42:50 MY_HOSTNAME weewx[944] INFO user.MQTTSubscribe: (Driver) Archive topic is None Jan 27 08:42:50 MY_HOSTNAME weewx[944] ERROR user.MQTTSubscribe: (Driver) Failed to connect to localhost at 1883. '[Errno 111] Connection refused' Jan 27 08:42:50 MY_HOSTNAME weewx[944] ERROR weewx.engine: Import of driver failed: [Errno 111] Connection refused (<class 'weewx.WeeWxIOError'>) Jan 27 08:42:50 MY_HOSTNAME weewx[944] CRITICAL weewx.engine: Traceback (most recent call last): Jan 27 08:42:50 MY_HOSTNAME weewx[944] CRITICAL weewx.engine: File "/usr/share/weewx/user/MQTTSubscribe.py", line 1655, in init Jan 27 08:42:50 MY_HOSTNAME weewx[944] CRITICAL weewx.engine: self.client.connect(host, port, keepalive) Jan 27 08:42:50 MY_HOSTNAME weewx[944] CRITICAL weewx.engine: File "/usr/lib/python3/dist-packages/paho/mqtt/client.py", line 937, in connect Jan 27 08:42:50 MY_HOSTNAME weewx[944] CRITICAL weewx.engine: return self.reconnect() Jan 27 08:42:50 MY_HOSTNAME weewx[944] CRITICAL weewx.engine: File "/usr/lib/python3/dist-packages/paho/mqtt/client.py", line 1071, in reconnect Jan 27 08:42:50 MY_HOSTNAME weewx[944] CRITICAL weewx.engine: sock = self._create_socket_connection() Jan 27 08:42:50 MY_HOSTNAME weewx[944] CRITICAL weewx.engine: File "/usr/lib/python3/dist-packages/paho/mqtt/client.py", line 3522, in _create_socket_connection Jan 27 08:42:50 MY_HOSTNAME weewx[944] CRITICAL weewx.engine: return socket.create_connection(addr, source_address=source, timeout=self._keepalive) Jan 27 08:42:50 MY_HOSTNAME weewx[944] CRITICAL weewx.engine: File "/usr/lib/python3.8/socket.py", line 808, in create_connection Jan 27 08:42:50 MY_HOSTNAME weewx[944] CRITICAL weewx.engine: raise err Jan 27 08:42:50 MY_HOSTNAME weewx[944] CRITICAL weewx.engine: File "/usr/lib/python3.8/socket.py", line 796, in create_connection Jan 27 08:42:50 MY_HOSTNAME weewx[944] CRITICAL weewx.engine: sock.connect(sa) Jan 27 08:42:50 MY_HOSTNAME weewx[944] CRITICAL weewx.engine: ConnectionRefusedError: [Errno 111] Connection refused Jan 27 08:42:50 MY_HOSTNAME weewx[944] CRITICAL weewx.engine:
Jan 27 08:42:50 MY_HOSTNAME weewx[944] CRITICAL weewx.engine:
During handling of the above exception, another exception occurred: Jan 27 08:42:50 MY_HOSTNAME weewx[944] CRITICAL weewx.engine:
Jan 27 08:42:50 MY_HOSTNAME weewx[944] CRITICAL weewx.engine:
Traceback (most recent call last): Jan 27 08:42:50 MY_HOSTNAME weewx[944] CRITICAL weewx.engine: File "/usr/share/weewx/weewx/engine.py", line 119, in setupStation Jan 27 08:42:50 MY_HOSTNAME weewx[944] CRITICAL weewx.engine: self.console = loader_function(config_dict, self) Jan 27 08:42:50 MY_HOSTNAME weewx[944] CRITICAL weewx.engine: File "/usr/share/weewx/user/MQTTSubscribe.py", line 1964, in loader Jan 27 08:42:50 MY_HOSTNAME weewx[944] CRITICAL weewx.engine: return MQTTSubscribeDriver(config_dict[DRIVER_NAME]) # pragma: no cover Jan 27 08:42:50 MY_HOSTNAME weewx[944] CRITICAL weewx.engine: ** File "/usr/share/weewx/user/MQTTSubscribe.py", line 1990, in init Jan 27 08:42:50 MY_HOSTNAME weewx[944] CRITICAL weewx.engine: self.subscriber = MQTTSubscriber(stn_dict, self.logger) Jan 27 08:42:50 MY_HOSTNAME weewx[944] CRITICAL weewx.engine: File "/usr/share/weewx/user/MQTTSubscribe.py", line 1658, in init Jan 27 08:42:50 MY_HOSTNAME weewx[944] CRITICAL weewx.engine: raise weewx.WeeWxIOError(exception) Jan 27 08:42:50 MY_HOSTNAME weewx[944] CRITICAL weewx.engine: weewx.WeeWxIOError: [Errno 111] Connection refused Jan 27 08:42:50 MY_HOSTNAME weewx[944] CRITICAL main: Unable to load driver: [Errno 111] Connection refused Jan 27 08:42:50 MY_HOSTNAME weewx[944] CRITICAL main: **** Exiting... Jan 27 08:42:51 MY_HOSTNAME fail2ban-server[815]: Server ready Jan 27 08:42:51 MY_HOSTNAME kernel: [ 15.188630] resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than 0000:02:00.0 [mem 0x000c0000-0x000dffff] Jan 27 08:42:51 MY_HOSTNAME kernel: [ 15.188647] caller os_map_kernel_space+0x131/0x170 [nvidia] mapping multiple BARs Jan 27 08:42:51 MY_HOSTNAME NetworkManager[689]: [1674826971.3004] policy: auto-activating connection 'MY_NETWORK_SSID' (24e28be3-2343-4526-8a9d-31a028c9e7be) Jan 27 08:42:51 MY_HOSTNAME NetworkManager[689]: [1674826971.3034] device (wlp4s0): Activation: starting connection 'MY_NETWORK_SSID' (24e28be3-2343-4526-8a9d-31a028c9e7be) Jan 27 08:42:51 MY_HOSTNAME NetworkManager[689]: [1674826971.3044] device (wlp4s0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed') Jan 27 08:42:51 MY_HOSTNAME NetworkManager[689]: [1674826971.3066] manager: NetworkManager state is now CONNECTING Jan 27 08:42:51 MY_HOSTNAME NetworkManager[689]: [1674826971.3090] device (wlp4s0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed') Jan 27 08:42:51 MY_HOSTNAME NetworkManager[689]: [1674826971.3107] device (wlp4s0): Activation: (wifi) access point 'MY_NETWORK_SSID' has security, but secrets are required. Jan 27 08:42:51 MY_HOSTNAME NetworkManager[689]: [1674826971.3109] device (wlp4s0): state change: config -> need-auth (reason 'none', sys-iface-state: 'managed') Jan 27 08:42:51 MY_HOSTNAME NetworkManager[689]: [1674826971.3364] device (wlp4s0): state change: need-auth -> prepare (reason 'none', sys-iface-state: 'managed') Jan 27 08:42:51 MY_HOSTNAME NetworkManager[689]: [1674826971.3393] device (wlp4s0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed') Jan 27 08:42:51 MY_HOSTNAME NetworkManager[689]: [1674826971.3411] device (wlp4s0): Activation: (wifi) connection 'MY_NETWORK_SSID' has security, and secrets exist. No new secrets needed. Jan 27 08:42:51 MY_HOSTNAME NetworkManager[689]: [1674826971.3420] Config: added 'ssid' value 'MY_NETWORK_SSID' Jan 27 08:42:51 MY_HOSTNAME NetworkManager[689]: [1674826971.3421] Config: added 'scan_ssid' value '1' Jan 27 08:42:51 MY_HOSTNAME NetworkManager[689]: [1674826971.3422] Config: added 'bgscan' value 'simple:30:-70:86400' Jan 27 08:42:51 MY_HOSTNAME NetworkManager[689]: [1674826971.3430] Config: added 'key_mgmt' value 'WPA-PSK WPA-PSK-SHA256 FT-PSK' Jan 27 08:42:51 MY_HOSTNAME NetworkManager[689]: [1674826971.3431] Config: added 'auth_alg' value 'OPEN' Jan 27 08:42:51 MY_HOSTNAME NetworkManager[689]: [1674826971.3432] Config: added 'psk' value '' Jan 27 08:42:51 MY_HOSTNAME wpa_supplicant[733]: wlp4s0: SME: Trying to authenticate with ROUTER_MAC (SSID='MY_NETWORK_SSID' freq=2462 MHz) Jan 27 08:42:51 MY_HOSTNAME kernel: [ 15.349910] wlp4s0: authenticate with ROUTER_MAC Jan 27 08:42:51 MY_HOSTNAME NetworkManager[689]: [1674826971.3732] device (wlp4s0): supplicant interface state: ready -> authenticating Jan 27 08:42:51 MY_HOSTNAME kernel: [ 15.353131] wlp4s0: send auth to ROUTER_MAC (try 1/3) Jan 27 08:42:51 MY_HOSTNAME wpa_supplicant[733]: wlp4s0: Trying to associate with ROUTER_MAC (SSID='MY_NETWORK_SSID' freq=2462 MHz) Jan 27 08:42:51 MY_HOSTNAME kernel: [ 15.374546] wlp4s0: authenticated Jan 27 08:42:51 MY_HOSTNAME NetworkManager[689]: [1674826971.4247] device (wlp4s0): supplicant interface state: authenticating -> associating Jan 27 08:42:51 MY_HOSTNAME kernel: [ 15.408093] wlp4s0: associate with ROUTER_MAC (try 1/3) Jan 27 08:42:51 MY_HOSTNAME kernel: [ 15.436309] wlp4s0: RX AssocResp from ROUTER_MAC (capab=0x411 status=0 aid=8) Jan 27 08:42:51 MY_HOSTNAME kernel: [ 15.454552] wlp4s0: associated Jan 27 08:42:51 MY_HOSTNAME wpa_supplicant[733]: wlp4s0: Associated with ROUTER_MAC Jan 27 08:42:51 MY_HOSTNAME wpa_supplicant[733]: wlp4s0: CTRL-EVENT-SUBNET-STATUS-UPDATE status=0 Jan 27 08:42:51 MY_HOSTNAME NetworkManager[689]: [1674826971.4835] device (wlp4s0): supplicant interface state: associating -> 4-way handshake

bellrichm commented 1 year ago

Hi, I think your theory is correct. Ideally these depencies would be handled when launching WeeWX. Currently the default startup of WeeWX uses an init.d script. I have limited understanding of init.d scripts, but I believe this is not easy. I think that WeeWX v5 will be using systemd and it should be easier then. In the meantime I think the WeeWX loop_on_init option, https://weewx.com/docs/usersguide.htm#General, should satisfy your needs. rich

mbruski commented 1 year ago

Thank you for that tip! Both machines reboot just fine and Weewx restarts when the driver is restarted 60 secs after the first failed attempt.

bellrichm commented 1 year ago

Good to hear. Thanks for letting me know!