UtilitechAS / amsreader-firmware

ESP8266 and ESP32 compatible firmware to read, interpret and publish data to MQTT from smart electrical meters, both DLMS and DSMR is supported
Other
381 stars 73 forks source link

AMs reader fails to connect to MQTT #709

Closed shadowano closed 6 months ago

shadowano commented 9 months ago

Describe the bug When my WIFI/Internet goes down and comes up again minutes later, the AMS reader struggles to recover from the outage. Today, I did nothing to the AMS reader hardware, like restart or resetting it to AP mode, and it partly comes up after some time. It now shows red on MQTT and shows

MQTT unspecified error -9

Some times I also see only ESP having a red color, but seconds later it shows up with green on the three first fields, and red on MQTT again.

I have tried to restart internet modem, my own WIFI router (which AMS reader is connected to), Raspberry Pi running Home Assistant and AMS reader. Still it fails to connect to MQTT server which runs in Home Assistant.

I also tried setting AMs reader back into AP mode, checked all settings, and saved. After reboot, I see MQTT is green. I checked HA and saw that data came in again. shortly after MQTT is red in AMS reader UI again, with same error.

I'm also struggling to access the web page when not in AP mode. Page is loading until browser gives timeout, or shows partly UI without any information. Settings and other pages does not load data. When in AP mode the page loads fast and without problems.

I had WIFI power saving set to Maximum (and unchecked "Allow 802.11b legacy rates"), and this has worked for a long time. I changed power saving to Default in AP mode, and after a few minutes after reboot it started working again, and this time it's been up 37 minutes, and data is sent over MQTT.

This is the second time it has happened to me when my internet fall out. Last time I think it was solved by restarting my modem and WIFI router as this was the last thing I tried out, but I also restarted the AMS reader, went in and out of AP modem, and suddenly it worked.

To be honest, I don't know where the problem is, but this time it was solved after fiddling in and out of AP mode (although I also restarted everything else first). I didn't think about enabling debug logging, but will try this next time (if I remember :P)

One thought, two red blinks are MQTT error, and three red blinks are WIFI not connected. On the Wiki it says no problem if combined with 5GHz. I think I saw two red blinks a couple of times. I have both 2,4 and 5 GHz enabled on my mesh WIFI network. If it can't connect to WIFI, will it alternate between three and two red blinks if MQTT is enabled? Would it be good to test turning off 5GHz to see if it manage to connect then?

Hardware information:

Relevant firmware information:

ArnieO commented 9 months ago

Thank you for your feedback!

As a general statement we see that we have a very limited number of users that struggle in similar ways to what you describe, and these are cases that we have spent quite a bit of time on - and these cases are complex, as the potential cause of the issues could be in so many places: The mesh system, its configuration (but some of them are so "user friendly" that the configuration options are very limited), the MQTT broker and its configuration, the HW on which the broker runs, overall load on that HW, the configuration of Home Assistance... and of course our firmware, which is the only point we really can control 100%.

I have a feeling that there are less issues of this kind after an update of the Arduino Framework for ESP32 where the device is locks to strongest BSSID (mesh node), not just SSID (which includes all nodes), but in my TP-link network I still see indications that devices do not consequently connect to the closest node. Maybe for good reasons, for all I know.

I do not think disabling 5GHz will change anything, unless it in some way makes the mesh system more responsive to what goes on in the 2.4 GHz band. But it could be interesting if you could try it and report back!

Concerning your red blink question I am not 100% sure, so I prefer @gskjold answer that.

gskjold commented 9 months ago

will it alternate between three and two red blinks if MQTT is enabled

Yes, it should. Also, two and three blinks are really hard to tell apart... I should adjust this, but have yet to do so

shadowano commented 9 months ago

Thanks for feedback, and I totally understand that this issue might be a needle in the haystack as there are a lot of unknowns that can have issues.

I do have another device using 2GHz connected to a second mesh node, and the amsleser is at least now connected to the mesh node in the basement (less then 50 centimeter apart). Will check both units next time. An maybe call the internet provider to check if something went down.

Will enabling debug logging on the AMS reader help in my case?

Also, two and three blinks are really hard to tell apart... I should adjust this, but have yet to do so

Yes, it is really hard to tell these apart:) If WIFI have problems, there is no need to also show that MQTT is erroring?

ArnieO commented 9 months ago

The internal traffic on your Wifi network should in principle not be affected if internet is down - but I admit I am not sure whether our devices will still be working correctly. I believe some features rely on having a correct watch - which is synchronized from internet NTP. (@gskjold ?)

To easier count the blinks, you can make a video - then play it at low speed (Windows 11 mediaplayer has the option of playing at 0,25x speed).

If WIFI have problems, there is no need to also show that MQTT is erroring?

I agree.

gskjold commented 9 months ago

HTTPS (Prices, but only with API key), MQTT/SSL, Realtime calculations, graphs are all dependent on a clock, but as long as it gets the time once, it should work even without internet connection. Except for prices, which would obiously need internet :)

shadowano commented 9 months ago

I don't have prices. In my initial post I had big problems getting the UI up in wifi mode, but not in AP mode. Sometimes the page didn't load, sometimes page loaded partly with empty top header and no content. What could cause it to be so unresponsive in wifi mode?

gskjold commented 9 months ago

Could it be that it actually is rebooting very often? If you try having USB power on it at all times, is it then stable?

ArnieO commented 9 months ago

@shadowano : Can you post a screenclip of the settings you have your config page, and also what you see on the info-page?

shadowano commented 9 months ago

Could it be that it actually is rebooting very often? If you try having USB power on it at all times, is it then stable?

I already have on USB power at all times, due to earlier issue with reboots every now and then.

image

image

image

image

ArnieO commented 9 months ago

Could you (as a test) try to change the DNS server setting you are using now, and in stead use 1.1.1.1 as primary DNS?

shadowano commented 9 months ago

The reported problem happened again yesterday, and I have still not got it up and running longer then for a few seconds. I saw that my mesh system showed red light, which means it does not have internet connection. It got back up online, but the amleser where struggling connecting.

We have two laptops using wifi on the mesh system and they are working fine, same with phones. Home assistant is connected with cable to mesh system and is working fine. I tried restarting mesh system, internet router, and ams leser. Changed amsleser to AP mode, adding DNS settings (static ip given by mesh router, gateway to mesh router and the dns of 1.1.1.1). At first it sems like it was a bit better in regard to response of the amsleser, but I'm not 100% sure. It also looks like the amsleser is rebooting several times over time (not 100% sure about this either).

the wifi symbol is green when the UI comes up (e.g. in AP mode) and shows either the text "wifi" or -27dBm. MQTT is red. I can also see that it manages to connect to MQTT shortly, before it goes back to error state. I receive MQTT data during this time, except as you can see below, not since yesterday evening (haven't looked into it

image

I have tried turning off 5 gHz but this did not change anything. I'm not sure what else to do to get the amsleser up again?

What happens if I turn off power and disconnect it from the AMS for a long time until lights turns off? will I lose config, might this help?

shadowano commented 8 months ago

I got it up again after turning off power for both mesh router and internet modem for a few minutes, and after they got online I rebooted amsleser, and after this it has been up since.

My internet or mesh dropped again today for a minute or so, but I see amsleser is still working fine. The amsleser rebooted this morning probably because of the dropout (it says Software reset (3/0).

ArnieO commented 8 months ago

@gskjold : Should it reboot (Software reset) due to a Wi-Fi blackout?

gskjold commented 8 months ago

@gskjold : Should it reboot (Software reset) due to a Wi-Fi blackout?

Ideally not, but I have seen this happening under certain conditions.

shadowano commented 8 months ago

Internet was shortly down again just now, and ams leser reports a software reset reboot in the same time period. But it managed to recover again when Internet was back up.

shadowano commented 7 months ago

This is a recurring issue for me, I have a hard time finding the cause. but as I have reported earlier in this thread the amsleser is non-responsive when trying to access during the time it reports MQTT errors. Could this be a bug that if MQTT is timing out the UI becomes unresponsive?

I'm using Mosquitto broker in Home Assistant as the MQTT server. In the logs I can see that it connects but either exceeds the timeout or for unknown reasons disconnects. The source port of amsreader is changing for each log line, is this due to reboot?

2024-02-20 08:12:29: New connection from 192.168.68.101:63488 on port 1883.
2024-02-20 08:12:29: New client connected from 192.168.68.101:63488 as amsreader (p2, c1, k10, u'mqtt_user').
2024-02-20 08:12:44: Client amsreader has exceeded timeout, disconnecting.
2024-02-20 08:18:32: New connection from 192.168.68.101:52506 on port 1883.
2024-02-20 08:18:32: New client connected from 192.168.68.101:52506 as amsreader (p2, c1, k10, u'mqtt_user').
2024-02-20 08:18:32: Client amsreader closed its connection.
2024-02-20 08:18:59: New connection from 192.168.68.101:52507 on port 1883.
2024-02-20 08:18:59: New client connected from 192.168.68.101:52507 as amsreader (p2, c1, k10, u'mqtt_user').
2024-02-20 08:19:14: Client amsreader has exceeded timeout, disconnecting.
2024-02-20 08:19:30: New connection from 192.168.68.101:52508 on port 1883.
2024-02-20 08:19:30: New client connected from 192.168.68.101:52508 as amsreader (p2, c1, k10, u'mqtt_user').
2024-02-20 08:20:20: Client amsreader has exceeded timeout, disconnecting.
2024-02-20 08:20:25: New connection from 192.168.68.101:52509 on port 1883.
2024-02-20 08:20:25: New client connected from 192.168.68.101:52509 as amsreader (p2, c1, k10, u'mqtt_user').
2024-02-20 08:20:25: Client amsreader closed its connection.
2024-02-20 08:20:51: New connection from 192.168.68.101:52510 on port 1883.
2024-02-20 08:20:51: New client connected from 192.168.68.101:52510 as amsreader (p2, c1, k10, u'mqtt_user').
2024-02-20 08:22:02: Client amsreader has exceeded timeout, disconnecting.
2024-02-20 08:22:07: New connection from 192.168.68.101:52511 on port 1883.
2024-02-20 08:22:07: New client connected from 192.168.68.101:52511 as amsreader (p2, c1, k10, u'mqtt_user').
2024-02-20 08:22:07: Client amsreader closed its connection.
2024-02-20 08:22:34: New connection from 192.168.68.101:52512 on port 1883.
2024-02-20 08:22:35: New client connected from 192.168.68.101:52512 as amsreader (p2, c1, k10, u'mqtt_user').
2024-02-20 08:22:35: Client amsreader closed its connection.
2024-02-20 08:23:08: New connection from 192.168.68.101:52513 on port 1883.
2024-02-20 08:23:08: New client connected from 192.168.68.101:52513 as amsreader (p2, c1, k10, u'mqtt_user').
2024-02-20 08:23:08: Client amsreader closed its connection.
2024-02-20 08:24:05: New connection from 192.168.68.101:52515 on port 1883.
2024-02-20 08:24:05: New client connected from 192.168.68.101:52515 as amsreader (p2, c1, k10, u'mqtt_user').
2024-02-20 08:24:05: Client amsreader closed its connection.
2024-02-20 08:24:35: New connection from 192.168.68.101:52516 on port 1883.
2024-02-20 08:24:35: New client connected from 192.168.68.101:52516 as amsreader (p2, c1, k10, u'mqtt_user').
2024-02-20 08:24:38: Client amsreader closed its connection.
2024-02-20 08:25:03: New connection from 192.168.68.101:52517 on port 1883.
2024-02-20 08:25:04: New client connected from 192.168.68.101:52517 as amsreader (p2, c1, k10, u'mqtt_user').
2024-02-20 08:26:20: Client amsreader has exceeded timeout, disconnecting.
2024-02-20 08:26:52: New connection from 192.168.68.101:52519 on port 1883.
2024-02-20 08:26:57: New client connected from 192.168.68.101:52519 as amsreader (p2, c1, k10, u'mqtt_user').
2024-02-20 08:26:57: Client amsreader closed its connection.
ArnieO commented 7 months ago

This is a recurring issue for me, I have a hard time finding the cause.

Strange issue. We have a lot of users on MQTT / Home Assistant. @gskjold will have to look into this when he is back from vacation (next week).

shadowano commented 7 months ago

I put the amsreader into AP mode again, and removed the MQTT ip address, saved and rebooted. Now the UI is responsive again. It has been up for 20 minutes.

shadowano commented 7 months ago

Any particular reason you did that in AP mode? You should be able to do that directly on the Config page.

I know that I can do that, but as I have stated several times in this thread the AMSreader UI is nonresponsive. The page just loads, like the amsreader is not responding to requests. Every now and then I can see the UI partly. It's obvious that the amsreader is struggling with something.

After I removed the MQTT and it ran without problem for a while I manually entered MQTT details again. Amsleser rebooted once, not sure if this was due to saving the MQTT settings, but amsreader was up and running for 20 minutes after this. After this the amsreader UI turned red on ESP and no other data is shown. See screenshot below

image

shadowano commented 7 months ago

I rebooted the amsleser manually from the HW button, after a few seconds it came up and started sending data to Home Assistant over MQTT. Less then 60 seconds after it showed like this:

image

A few seconds later it comes up like this image

Edit: and this happens several times over time. goes from one state to the other, and then back again...

ArnieO commented 7 months ago

It's obvious that the amsreader is struggling with something.

Yes, several strange things going on here. Either in the firmware or your MQTT broker, it is hard to say. We have a lot of users running MQTT on v2.2.28 now, with very few issues.

We need @gskjold back from vacation to look into this.

shadowano commented 7 months ago

If I stop the MQTT broker I have in home Assistant the amsreader is totally unresponsive. The browser cannot reach amsreader interface at all.

@ArnieO could you try to do the same, set MQTT to any local ip that does not respond to anything and see if the AMS reader behaves as normal?

ArnieO commented 7 months ago

@ArnieO could you try to do the same, set MQTT to any local ip that does not respond to anything and see if the AMS reader behaves as normal?

I am running a test version of v2.3, but AFAIK there are no changes on this part of the code. Reader is still responsive, behaves as it should, signaling the MQTT issue like this: image

Rasmuslauenstein commented 7 months ago

Exactly the same issue here getting worse and worse. Running version 2.2.28 and several times a day home assistant does not receive data. Amsreader webinterface not available but responding to ping, no wifi disconnects or connection problems with wifi. But mqtt connection error or non responding web/config page. Then after some time it becomes responsive again, for maybe one message and then no data again.

ArnieO commented 7 months ago

Running version 2.2.28 and several times a day home assistant does not receive data.

OK - so now we have two reports on the samme issue, which is an indication there might be something here. What other similarities are there between you two?

shadowano commented 7 months ago

What other similarities are there between you two?

  • Are both on a mesh network?
  • Using HA built-in MQTT broker or separate (Mosquitto?) ?
  • Other things?

Then after some time it becomes responsive again, for maybe one message and then no data again.

I have seen this too. For me it seems to be a bit random if it sends a few messages now and then, or if it doesn't send at all.

Also see the Mosquitto log from earlier in this thread. This shows that the amsreader is connected, but something causes it to disconnect or timeout

avandorp commented 7 months ago

I had similar issues until I've tried to save CPU time and power. As I use MQTT over TLS I don't really use the web interface and deactivated most graphs. I've also set WiFi Power Saving to maximum even though the reader is in a basement far away from the WiFi hotspot (and double check that you don't have legacy rates enabled). Additionally it has a fixed IP address. If I could I would also disable the historical values, because I track them elsewhere. Since those adjustments I'm very happy with the stability.

gskjold commented 7 months ago

I have made some changes for 2.3 that I hope will work. If you are interested in testing, please check out the latest build in #738 . Please consider the warning in the top of the post, only flash if you are able to re-flash by USB.

shadowano commented 7 months ago

Thank you, @gskjold! Very much appreciated:) I don't have the possibility to flash, so I have to wait until the final release. Looking forward to it.

Do you know if there have been done changes to the mqtt handling in a recent earlier version that could explain the issue we have reported here? It's so strange that this issue started recently, and still continue to happen. Yesterday I installed new hardware update on my mesh system, and amsreader did not manage to connect by itself after mesh system came up again. It connected fine after first restarting mesh node, and then restarting amsleser manually

ArnieO commented 6 months ago

I have made some changes for 2.3 that I hope will work. If you are interested in testing, please check out the latest build in #738 . Please consider the warning in the top of the post, only flash if you are able to re-flash by USB.

@Rasmuslauenstein : Can you check if your issue seems to be resolved with the latest version?