HASwitchPlate / openHASP

HomeAutomation Switchplate based on lvgl for ESP32
https://www.openhasp.com
MIT License
694 stars 177 forks source link

Continuous MQTT reconnect when connected to a locked down IOT VLAN (i.e. no internet access) #306

Closed sumnerboy12 closed 2 years ago

sumnerboy12 commented 2 years ago

Perform all steps below and tick them with [x]

Describe the bug

If I connect my device to a WiFi network with no internet access (i.e. locked down IOT VLAN) my device restarts the MQTT connection every 30s. The MQTT connection is successful, as my firewall permits connections out of the IOT VLAN to my MQTT broker, but it keeps dropping and reconnecting regardless.

To Reproduce

Join your openHASP device to a Wifi VLAN with no internet access and watch it continually reconnect to the MQTT broker.

Expected behavior

Should support running in a locked down VLAN, no reason to need internet access for a local-only IOT device like this.

Screenshots or video

hasp/plate01/LWT online
hasp/plate01/state/statusupdate {"node":"plate01","idle":"short","version":"0.6.2","uptime":127,"ssid":"hw-iot","rssi":-43,"ip":"192.168.40.81","mac":"1C:9D:C2:F8:E8:5C","heapFree":99044,"heapFrag":3,"core":"v3.3.5-1-g85c43024c","canUpdate":"false","page":1,"numPages":12,"tftDriver":"ST7796","tftWidth":320,"tftHeight":480}
hasp/plate01/state/idle short
hasp/plate01/state/page 1
hasp/plate01/state/sensors {"time":"1970-01-01T02:13:42","uptimeSec":127,"uptime":"0T00:02:07","ESP32":{"Temperature":72.77}}
hasp/discovery {"node":"plate01","mdl":"wt32-sc01","mf":"openHASP","hwid":"1c9dc2f8e85c","pages":12,"sw":"0.6.2","input":{},"power":[],"light":[],"dim":[]}
hasp/plate01/LWT offline
hasp/plate01/LWT online
hasp/plate01/state/statusupdate {"node":"plate01","idle":"long","version":"0.6.2","uptime":187,"ssid":"hw-iot","rssi":-41,"ip":"192.168.40.81","mac":"1C:9D:C2:F8:E8:5C","heapFree":99160,"heapFrag":3,"core":"v3.3.5-1-g85c43024c","canUpdate":"false","page":1,"numPages":12,"tftDriver":"ST7796","tftWidth":320,"tftHeight":480}
hasp/plate01/state/idle long
hasp/plate01/state/page 1
hasp/plate01/state/sensors {"time":"1970-01-01T02:14:42","uptimeSec":187,"uptime":"0T00:03:07","ESP32":{"Temperature":72.77}}
hasp/discovery {"node":"plate01","mdl":"wt32-sc01","mf":"openHASP","hwid":"1c9dc2f8e85c","pages":12,"sw":"0.6.2","input":{},"power":[],"light":[],"dim":[]}
hasp/plate01/LWT offline
hasp/plate01/LWT online
hasp/plate01/state/statusupdate {"node":"plate01","idle":"long","version":"0.6.2","uptime":247,"ssid":"hw-iot","rssi":-41,"ip":"192.168.40.81","mac":"1C:9D:C2:F8:E8:5C","heapFree":99064,"heapFrag":3,"core":"v3.3.5-1-g85c43024c","canUpdate":"false","page":1,"numPages":12,"tftDriver":"ST7796","tftWidth":320,"tftHeight":480}
hasp/plate01/state/idle long
hasp/plate01/state/page 1
hasp/plate01/state/sensors {"time":"1970-01-01T02:15:42","uptimeSec":247,"uptime":"0T00:04:07","ESP32":{"Temperature":72.77}}
hasp/discovery {"node":"plate01","mdl":"wt32-sc01","mf":"openHASP","hwid":"1c9dc2f8e85c","pages":12,"sw":"0.6.2","input":{},"power":[],"light":[],"dim":[]}
hasp/plate01/LWT offline
nagyrobi commented 2 years ago

Try movinng the broker in the same vlan.

fvanroie commented 2 years ago

There is no need for the MQTT client and broker to be on the same subnet. Also, there shouldn't be any need to have internet access either... so not sure what is causing this issue. Did you check the log-files of the broker?

nagyrobi commented 2 years ago

There is no need for the MQTT client and broker to be on the same subnet.

Indeed, but to debug the problem this could temporarily be a good step. Could be a misconfigured router causing the problem.

I use the same approach (only that the broker is in the same vlan as the IoT devices), and have no issues with stability. The main reason to keep all IoT stuff including the broker and the automation system in the same VLAN is to have the system operating even when the inter-vlan routing device fails for any reason. No internet access on this vlan.

sumnerboy12 commented 2 years ago

The broker shows the WT32-SC01 connecting successfully, and then 30s later closing the connection. No error. It appears to be a forced reconnect initiated by the device.

What is the logic around triggering a MQTT reconnect? I did notice that the status update published (before it reconnects) contained a date time of 1970-01-01 - is it trying to get the time from an NTP server? Could that be causing problems?

sumnerboy12 commented 2 years ago

Indeed, but to debug the problem this could temporarily be a good step. Could be a misconfigured router causing the problem.

I have many other IOT devices on this VLAN, all connecting to the MQTT broker (on my main VLAN) with no problems. I am confident the network and routing is ok.

I can see the logic behind your decision to have everything home automation related on that IOT VLAN, however I took a slightly different approach. I run openHAB, NodeRED and mosquitto on my main data VLAN, so I can access from phone/laptop etc. Everything else lives on the IOT VLAN. If something needs to receive commands or publish status then it is forced to use MQTT as 1883 is the only port open from that VLAN in/out.

You are right that I am susceptible to a failure in my routing which would isolate all those IOT devices, but I think the benefits and ease of accessing the main part of my automation systems outweighs that.

arovak commented 2 years ago

I have a similar setup, Openhab and mosquitto on main VLAN and IOT devices on separate VLAN without internet access and can't reproduce the issue you have.

Tested with 0.6.2 and 0.6.3-dev, both versions remain online when no access is allowed to internet or the NTP server. {"time":"1970-01-01T01:28:00","uptimeSec":608,"uptime":"0T00:10:08","ESP32":{"Temperature":61.11}}

It might be your broker thinking 2 devices are using the same identifier causing the mqtt connection to be disconnected.

sumnerboy12 commented 2 years ago

Appreciate the feedback and suggestions.

I tried again today, connecting to my IOT VLAN. You can see the mosquitto logs below. The device connects successfully and then exactly 30s later, closes the connection. There is nothing else connecting at the same time. Then 28s after the connection is closed, it successfully reconnects.

It looks like this is the device closing the connection, given it is exactly 30s after connecting every time. And given it works ok when the device is on my main VLAN, I don't think it is the broker causing this.

I was hoping someone (a dev) might recognise the 30s delay between connect/disconnect and know where in the code that is being used. I will try and have a dig around and see if I can see what might be causing it.

mosquitto_app.1.tu0uezo36in2@nuc03    | 2022-03-03T14:48:38: New connection from 172.16.200.99:56358 on port 1883.
mosquitto_app.1.tu0uezo36in2@nuc03    | 2022-03-03T14:48:38: New client connected from 172.16.200.99:56358 as plate_f8e85c (p2, c1, k15, u'openhasp').
mosquitto_app.1.tu0uezo36in2@nuc03    | 2022-03-03T14:49:08: Client plate_f8e85c closed its connection.
mosquitto_app.1.tu0uezo36in2@nuc03    | 2022-03-03T14:49:35: New connection from 172.16.200.99:56582 on port 1883.
mosquitto_app.1.tu0uezo36in2@nuc03    | 2022-03-03T14:49:35: New client connected from 172.16.200.99:56582 as plate_f8e85c (p2, c1, k15, u'openhasp').
mosquitto_app.1.tu0uezo36in2@nuc03    | 2022-03-03T14:50:05: Client plate_f8e85c closed its connection.
mosquitto_app.1.tu0uezo36in2@nuc03    | 2022-03-03T14:50:33: New connection from 172.16.200.99:56820 on port 1883.
mosquitto_app.1.tu0uezo36in2@nuc03    | 2022-03-03T14:50:33: New client connected from 172.16.200.99:56820 as plate_f8e85c (p2, c1, k15, u'openhasp').
sumnerboy12 commented 2 years ago

Hmm - here is the serial debugging - maybe it is something to do with my broker - it looks like the connection is being dropped (no idea why as nothing in the mosquitto logs) and openHASP is attempting to reconnect on the next 5s loop.

The first attempt to reconnect usually fails, but the next succeeds.

Rinse repeat endlessly.

[1000DPrompt > Prompt > [21:40:02.949][107936/110092  1][41284/41880  2] MQTT: Disconnected from broker, reconnection...
Prompt > [21:40:02.963][107936/110028  1][41284/41880  2] MQTT: plate_f8e85c
Prompt > [21:40:06.936][104772/108064  3][41284/41880  2] MQTT: Connected to broker mqtt.home as clientID plate_f8e85c
Prompt > [21:40:06.954][104772/106244  1][41284/41880  2] MQTT:     * Subscribed to hasp/plates/command/#
Prompt > [21:40:06.971][101348/104488  3][41284/41880  2] MQTT:     * Subscribed to hasp/plate01/command/#
Prompt > [21:40:06.987][99636/102736  3][41284/41880  2] MQTT:     * Subscribed to hasp/plates/config/#
Prompt > [21:40:07.004][97924/100980  3][41284/41880  2] MQTT:     * Subscribed to hasp/plate01/config/#
Prompt > [21:40:07.021][97924/99228  1][41284/41880  2] MQTT:     * Subscribed to hasp/broadcast/command/#
Prompt > [21:40:07.143][92948/94412  1][41284/41884  2] MQTT PUB: statusupdate => {"redacted":480}
Prompt > [21:40:07.185][92948/94344  1][41284/41884  2] MQTT PUB: idle => long
Prompt > [21:40:07.201][92948/94416  1][41284/41884  2] MQTT PUB: page => 1
Prompt > [21:40:07.217][94660/95896  3][41284/41884  2] MQTT PUB: sensors => {"time":"2022-03-02T21:40:07","uptimeSec":1624,"uptime":"0T00:27:04","ESP32":{"Temperature":72.77}}
Prompt > [21:40:07.241][92948/94416  1][41284/41884  2] MQTT PUB: discovery => {"node":"plate01","mdl":"wt32-sc01","mf":"openHASP","hwid":"1c9dc2f8e85c","pages":12,"sw":"0.6.2","input":{},"power":[],"light":[],"dim":[]}
Prompt > Prompt > [21:40:56.018][107936/110096  1][41284/41884  2] MQTT: Disconnected from broker, reconnection...
Prompt > [21:40:56.032][107936/110032  1][41284/41884  2] MQTT: plate_f8e85c
Prompt > [21:41:02.900][107936/110096  1][41284/41884  2] MQTT: Connection failed
Prompt > Prompt > [21:41:02.956][107936/110096  1][41284/41884  2] MQTT: Disconnected from broker, reconnection...
Prompt > [21:41:02.970][107936/110032  1][41284/41884  2] MQTT: plate_f8e85c
Prompt > [21:41:06.929][104772/108056  3][41284/41880  2] MQTT: Connected to broker mqtt.home as clientID plate_f8e85c
Prompt > [21:41:06.946][104772/106236  1][41284/41880  2] MQTT:     * Subscribed to hasp/plates/command/#
Prompt > [21:41:06.963][101348/104476  2][41284/41880  2] MQTT:     * Subscribed to hasp/plate01/command/#
Prompt > [21:41:06.980][99636/102724  3][41284/41880  2] MQTT:     * Subscribed to hasp/plates/config/#
Prompt > [21:41:06.996][99636/100976  1][41284/41880  2] MQTT:     * Subscribed to hasp/plate01/config/#
Prompt > [21:41:07.014][97924/99220  1][41284/41880  2] MQTT:     * Subscribed to hasp/broadcast/command/#
Prompt > [21:41:07.138][92948/95920  3][41284/41884  2] MQTT PUB: statusupdate => {"redacted":480}
Prompt > [21:41:07.180][92948/94336  1][41284/41884  2] MQTT PUB: idle => long
Prompt > [21:41:07.196][92948/94336  3][41284/41884  2] MQTT PUB: page => 1
Prompt > [21:41:07.213][92948/94404  1][41284/41884  2] MQTT PUB: sensors => {"time":"2022-03-02T21:41:07","uptimeSec":1684,"uptime":"0T00:28:04","ESP32":{"Temperature":72.77}}
Prompt > [21:41:07.238][92948/95884  3][41284/41884  2] MQTT PUB: discovery => {"node":"plate01","mdl":"wt32-sc01","mf":"openHASP","hwid":"1c9dc2f8e85c","pages":12,"sw":"0.6.2","input":{},"power":[],"light":[],"dim":[]}
Prompt > Prompt > [21:41:56.032][107936/110096  1][41284/41884  2] MQTT: Disconnected from broker, reconnection...
Prompt > [21:41:56.046][107936/110032  1][41284/41884  2] MQTT: plate_f8e85c
Prompt > [21:42:02.907][107936/110096  1][41284/41880  2] MQTT: Connection failed
Prompt > Prompt > [21:42:02.963][107936/110096  1][41284/41880  2] MQTT: Disconnected from broker, reconnection...
Prompt > [21:42:02.977][107936/110032  1][41284/41880  2] MQTT: plate_f8e85c
Prompt > [21:42:06.933][106484/108060  1][41284/41880  2] MQTT: Connected to broker mqtt.home as clientID plate_f8e85c
Prompt > [21:42:06.950][104932/106240  1][41284/41880  2] MQTT:     * Subscribed to hasp/plates/command/#
Prompt > [21:42:06.967][101508/104480  2][41284/41880  2] MQTT:     * Subscribed to hasp/plate01/command/#
Prompt > [21:42:06.984][99796/102728  2][41284/41880  2] MQTT:     * Subscribed to hasp/plates/config/#
Prompt > [21:42:07.000][98084/100976  2][41284/41880  2] MQTT:     * Subscribed to hasp/plate01/config/#
Prompt > [21:42:07.017][98084/99224  1][41284/41880  2] MQTT:     * Subscribed to hasp/broadcast/command/#
Prompt > [21:42:07.141][93108/95920  2][41284/41884  2] MQTT PUB: statusupdate => {"redacted":480}
Prompt > [21:42:07.183][93108/94340  1][41284/41884  2] MQTT PUB: idle => long
Prompt > [21:42:07.198][93108/94412  1][41284/41884  2] MQTT PUB: page => 1
Prompt > [21:42:07.215][93108/95920  2][41284/41884  2] MQTT PUB: sensors => {"time":"2022-03-02T21:42:07","uptimeSec":1744,"uptime":"0T00:29:04","ESP32":{"Temperature":72.77}}
Prompt > [21:42:07.240][93108/94412  1][41284/41884  2] MQTT PUB: discovery => {"node":"plate01","mdl":"wt32-sc01","mf":"openHASP","hwid":"1c9dc2f8e85c","pages":12,"sw":"0.6.2","input":{},"power":[],"light":[],"dim":[]}
arovak commented 2 years ago

Does the device have a good wifi connection? When connected to your main VLAN it stays connected to the mqtt server?

sumnerboy12 commented 2 years ago

Yep excellent, right next to a Unifi AC Lite AP. When on the main VLAN it is rock solid.

On Fri, 4 Mar 2022, 6:25 am arovak, @.***> wrote:

Does the device have a good wifi connection? When connected to your main VLAN it stays connected to the mqtt server?

— Reply to this email directly, view it on GitHub https://github.com/HASwitchPlate/openHASP/issues/306#issuecomment-1058296134, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGTR2UNPWD5567QIQEBCILU6DYXPANCNFSM5PTJZNRQ . You are receiving this because you authored the thread.Message ID: @.***>

fvanroie commented 2 years ago

I was hoping someone (a dev) might recognise the 30s delay between connect/disconnect and know where in the code that is being used.

There is no 30 second delay implemented in the code. The plate checks every 5 seconds if the MQTT client is still connected.

Are you using WPA3 on your APs? I've heard about WPA3/WPA2 Compat Mode causing all sorts of Disconnects. Also GTK Rekeying seems to be causing disconnects.

sumnerboy12 commented 2 years ago

No WPA3 (AFAIK). And none of my other IOT devices on that VLAN have any issues. I have had a good look around your code and can't see anything obvious. But will keep digging. Appreciate your help so far!

fvanroie commented 2 years ago

Did you find a root cause or solution to this issue? Without being able to replicate it will be very hard to do troubleshooting...

sumnerboy12 commented 2 years ago

I couldn't figure it out sorry - ended up putting the device on my main/data VLAN.

fvanroie commented 2 years ago

Thanks! I will close this issue for now. If anything comes up or someone else can provide addition feedback feel free to re-open.