UtilitechAS / amsreader-firmware

ESP8266 and ESP32 compatible firmware to read, interpret and publish data to MQTT from smart electrical meters, both DLMS and DSMR is supported
Other
386 stars 73 forks source link

Version >=2.0.1 is crashing on ESP8266 #174

Closed Jendem closed 2 years ago

Jendem commented 2 years ago

Crash after flashing using bin file and compiling latest source.

Relevant firmware information:

Same log for both versions:

 ets Jan  8 2013,rst cause:2, boot mode:(3,6)

load 0x4010f000, len 3460, room 16
tail 4
chksum 0xcc
load 0x3fff20b8, len 40, room 4
tail 4
chksum 0xc9
csum 0xc9
v000a86b0
~ld
Sensors: 0

--------------- CUT HERE FOR EXCEPTION DECODER ---------------

Exception (28):
epc1=0x4020a268 epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000000 depc=0x00000000

>>>stack>>>

ctx: cont
sp: 3ffffa20 end: 3fffffc0 offset: 0190
3ffffbb0:  3fff26c8 00000001 3fff2aa0 3fff2d1c
3ffffbc0:  3fff26c8 3fff29d8 3fff2918 4020b6cb
3ffffbd0:  effefeef effefeef effefeef effefeef
3ffffbe0:  effefeef effefeef effefeef effefeef
3ffffbf0:  effefeef effefeef effefeef effefeef
3ffffc00:  ef00feef effefeef effefeef effefeef
3ffffc10:  effefeef effefeef effefeef effefeef
3ffffc20:  effefeef effefeef effefeef effefeef
3ffffc30:  effefeef effefeef effefeef effefeef
3ffffc40:  7365feef effe0070 effefeef effefeef
3ffffc50:  effefeef effefeef effefeef effefeef
3ffffc60:  effefeef effefeef effefeef effefeef
3ffffc70:  effefeef effefeef effefeef effefeef
3ffffc80:  effefeef effefeef effefeef effefeef
3ffffc90:  effefeef effefeef effefeef effefeef
3ffffca0:  effefeef effefeef effefeef effefeef
3ffffcb0:  effefeef effefeef effefeef effefeef
3ffffcc0:  6675feef 376f7277 73653434 effe0070
3ffffcd0:  effefeef effefeef effefeef effefeef
3ffffce0:  effefeef effefeef effefeef effefeef
3ffffcf0:  effefeef effefeef effefeef effefeef
3ffffd00:  effefeef effefeef effefeef effefeef
3ffffd10:  effefeef effefeef effefeef effefeef
3ffffd20:  effefeef effefeef effefeef effefeef
3ffffd30:  effefeef effefeef effefeef effefeef
3ffffd40:  effefeef effefeef effefeef effefeef
3ffffd50:  effefeef effefeef effefeef effefeef
3ffffd60:  effefeef effefeef effefeef effefeef
3ffffd70:  effefeef effefeef effefeef effefeef
3ffffd80:  effefeef effefeef effefeef effefeef
3ffffd90:  effefeef effefeef effefeef effefeef
3ffffda0:  effefeef effefeef effefeef effefeef
3ffffdb0:  effefeef effefeef effefeef effefeef
3ffffdc0:  0000feef 7070612f 6163696c 6e6f6974
3ffffdd0:  2e32762d 2e322e30 0000736a 40234690
3ffffde0:  00000000 3fff403c 00000000 40211dd0
3ffffdf0:  40233840 40217584 00000000 3fff2728
3ffffe00:  00000000 3fff3e8c 3fff15d8 3fff2728
3ffffe10:  4021a0f4 00000000 3fff2728 3fffff4c
3ffffe20:  3ffeb6a2 3ffe9429 00000020 401012b4
3ffffe30:  3ffffe60 00000001 00000040 3fffff4c
3ffffe40:  3fffff60 3fff29d8 3fff2910 4020be48
3ffffe50:  00445453 01000000 0000030a 0000003c
3ffffe60:  01680101 6f700168 6e2e6c6f 6f2e7074
3ffffe70:  00006772 00000000 00000000 00000000
3ffffe80:  00000000 00000000 00000000 00000000
3ffffe90:  00000000 00000000 00000000 00000000
3ffffea0:  00000000 feef0000 feefeffe feefeffe
3ffffeb0:  feefeffe feefeffe feefeffe feefeffe
3ffffec0:  feefeffe feefeffe feefeffe feefeffe
3ffffed0:  feefeffe feefeffe feefeffe feefeffe
3ffffee0:  00000000 00000000 00000000 00000000
3ffffef0:  00000000 00000000 00000000 00000000
3fffff00:  00000000 00000000 00000000 00000000
3fffff10:  00000000 00000000 00000000 000003e8
3fffff20:  00445453 01000000 0000030a 0000003c
3fffff30:  feefeffe feefeffe feefeffe feefeffe
3fffff40:  feefeffe feefeffe feefeffe 00545344
3fffff50:  01000000 00000203 00000078 feefeffe
3fffff60:  01000000 00000203 feefeffe feefeffe
3fffff70:  01000000 0000030a feefeffe fe050000
3fffff80:  000001f7 feefeffe 3fff3d44 00000000
3fffff90:  feefeffe feefeffe feefeffe 3fff2d1c
3fffffa0:  3fffdad0 00000000 3fff2d08 40230d98
3fffffb0:  feefeffe feefeffe 3ffe8654 40100781
<<<stack<<<

--------------- CUT HERE FOR EXCEPTION DECODER ---------------
ArnieO commented 2 years ago

To be sure I understand: You can go back to v2.0.0 and it does not crash? I am running an old Kamstrup module (like yours), and do not have this issue.

Jendem commented 2 years ago

Yes, v2.0.0 is ok, the newer ones crashes at startup. Maybe related: I cannot set static IP in v2.0.0, when it restarts, it goes into AP mode and all setting are lost. (Haven't looked into this with serial debugging)

NicolaiPetri commented 2 years ago

I successfully upgraded to 2.0.1, but now I have lost access to device and a reset doesn't seem to get it back online.. So I guess I might have the same issue with crashing. I did have static ip configured. It doesn't look like it connects to my wifi at all and it doesn't look like it is AP mode

ArnieO commented 2 years ago

Strange... I use static IP, and have not seen any issue with the upgrade. (I skipped 2.0.1, went from 2.0.0 to 2.0.2 but cannot see how that should impact this). @NicolaiPetri : Do you have what is needed to reflash it by cable (see user manual chapter 3)? I'm afraid that is the only option if the ESP is bricked.

gskjold commented 2 years ago

Unable to reproduce this. Considering how many Kamstrup users we have, I think this must be related to configuration. Erase flash, reflash latest version and configure one thing at the time and see when it breaks.

Jendem commented 2 years ago

Everything is working after erasing flash first!

python esptool.py --port "COM8" write_flash --erase-all 0x0 firmware.bin

ArnieO commented 2 years ago

Everything is working after erasing flash first!

python esptool.py --port "COM8" write_flash --erase-all 0x0 firmware.bin

@Jendem

@gskjold Maybe the flash command examples in the Wiki should be updated by adding the Erasing Flash Before Write option? It could make the reflashing process more robust.

Jendem commented 2 years ago
ArnieO commented 2 years ago

Never tried OTA, only by serial.

Thank you for that information. Most users will update OTA, so it was important to clarify that your problems were not linked to that.

bardahlm commented 2 years ago

I think I have the same issue. My module crashes now and then. It works again for some time if I remove the module from the meter and wait some time before reinserting. Do I have to connect to the serial port to find out why it crashes?

I have a POW-K, using 2.0.2 from Github.

ArnieO commented 2 years ago

Do I have to connect to the serial port to find out why it crashes?

Maybe difficult to catch it while it happens, but you can activate telnet debugging in menu System/debugging: image

Then open a command window on PC and use command telnet <IP address>

bardahlm commented 2 years ago

My issue seems to be that it drops of the network, as there are hourly data from the period while it was offline. So maybe not a crash but some wifi issue. Do the ESP have space for logging or will that burn out the flash?

ArnieO commented 2 years ago

My issue seems to be that it drops of the network, as there are hourly data from the period while it was offline. So maybe not a crash but some wifi issue. Do the ESP have space for logging or will that burn out the flash?

You can see if it has crashed and restarted by the uptime counter. I can see now that I too have a restart issue: image

It is not a big problem for me, but @gskjold will surely look into this.

The data points for the graphs are calculated from the whole-hour List 3 datagrams when the meter reports accumulated consumption (kWh's) and stored in flash memory. So a reboot will not cause it to lose graph data points unless it happens at that moment when the meter sends List 3.

bardahlm commented 2 years ago

My unit was offline the entire night, there are big gaps in my graphs. When I disconnected and reconnected it, it came back online. As the hourly data was recorded one can assume that it was up and running in the period with missing data.

Jendem commented 2 years ago

Mine has also restarted now, I believe it was up for 4 days. Looks like a very short restart, cannot see any gap in my database. (Kamstrup 10 sec interval)

Jendem commented 2 years ago

And now i crashed again, at uptime = 313603 seconds. Logged data:

image

Edit: I'm running 8751b6325d09a5a8a204149b2688d05d78a70319

gskjold commented 2 years ago

Very interesting. Are you all on Kamstrup?

ArnieO commented 2 years ago

Good observation! Yes, all (@Jendem, @bardahlm and myself) that have reported the issue here so far are on Kamstrup, using some version of Pow-K.

In addition to upgrading, I moved from one Kamstrup to an other recently, in parallel with upgrading to v2.x.x. So I cannot say if it is the upgrade or the moving to a different Kamstrup meter that is the reason for this. I never had this issue on my previous location (with earlier firmware versions). So there is a possibility that this could be linked to some issue on individual meters (like Vout dropping out for a short period), causing a restart.

If this is the case, it should be visible on the Vcc reading just before the restart, as the supercap in Pow-K will hold the voltage up for a while (but dropping) even if Vout from the meter has dropped to zero. However, the above logging by @Jendem confirms that Vcc is stable during the restart - so this hypothesis seems incorrect.

I really don't see any other Pow-K HW related phenomena than loss of input voltage that could explain a reboot.

Are there any users on Aidon or Kaifa that have seen this?

Ideas on where to look are welcome!

gskjold commented 2 years ago

Could be newly added data parser in v2.x series firmware. Will have a look when I have time.

ThomasEdvardsen commented 2 years ago

I am having severe rebooting issues on Kaifa, running AMS reader 220103.7.

Would like to downgrade to v. 2.0.0, to find if it stabilizes on that version. Do I have to completely erase the entire flash chip and reconfigure to avoid problems with the existing config files?

Screenshot from 2022-01-05 10-59-52

ThomasEdvardsen commented 2 years ago

Reflashed with the same version as before (220103.7), but with complete erasing of chip. Configured with the same values, and awaiting uptime logging to see if it helps.

gskjold commented 2 years ago

If it doesn't work, try 220105.2: esp32.zip esp8266.zip

ThomasEdvardsen commented 2 years ago

I doesn't work, so I am intalling 220105.2 now.

ThomasEdvardsen commented 2 years ago

Still the same with 220105.2

gskjold commented 2 years ago

Just to recap this tread:

esp8266.zip esp32.zip

gskjold commented 2 years ago

I found one thing that may cause reboots, new firmware file below. esp8266.zip esp32.zip

Jendem commented 2 years ago

Are the zip files you upload based on master or on uncommitted/unpushed stuff?

gskjold commented 2 years ago

Uncommited

ThomasEdvardsen commented 2 years ago

Still getting reboots with version 2.0.3. I am not sure if 2.0.0 works better, as I never have tried that version.

ThomasEdvardsen commented 2 years ago

Installed 2.0.0, and it runs without rebooting. All newer versions I have tried have random reboots.

Hardware information:

Relevant firmware information:

gskjold commented 2 years ago

Can you also confirm that the problem starts with v2.0.1 ? I'm trying to narrow my search... What MQTT payload?

ThomasEdvardsen commented 2 years ago

Sure, will install it now. Are running JSON payload.

Jendem commented 2 years ago

I was running 8751b6325d09a5a8a204149b2688d05d78a70319 until i updated to 9897ccc56390da4429b66e498f69a15af5a74287 at the vertical blue line (last 7 days shown): image

I also use JSON payload

ThomasEdvardsen commented 2 years ago

Could confirm that both 2.0.0 and 2.0.1 are running without reboots.

Jendem commented 2 years ago

Are you sure about that? My reboot problems occurred after >2 days uptime at 2.0.1

gskjold commented 2 years ago

@Jendem JSON payload? Any temp sensors?

Jendem commented 2 years ago

Yes, JSON payload and no temp. Would it be an idea to test deactivating MQTT and instead try to poll http://ams/data.json?

gskjold commented 2 years ago

Maybe, I am at a loss here, so any tests you can do is useful. You say 2.0.1 reboots after 2 days, does 2.0.2 reboot more often?

ThomasEdvardsen commented 2 years ago

@Jendem Not sure, but went from multiple reboots each hour, to running 13 hours (for now) without rebooting. So at least more stable. I will continue on 2.0.1 for a while.

v2.0.1 are older than https://github.com/gskjold/AmsToMqttBridge/commit/8751b6325d09a5a8a204149b2688d05d78a70319

EDIT: Are running with debug level ERROR.

ArnieO commented 2 years ago

Mine is so far stable with 2.0.3 (Pow-K, UART0, no MQTT yet).

Question to @gskjold : Is deactivating Telnet enough, or must the debug mode also be set to "Error"? I believe you suggested somewhere doing both. I have had debug disabled for a while, but changed mode from "Debug" to "Error" on Sunday (when i changed to 220109.1). Before setting mode to "Error" I had unexpected restarts. Nothing since changing it to "Error".

Might be a coincidence, might be a clue...

gskjold commented 2 years ago

Not coincidence, setting level to error makes a difference.

ArnieO commented 2 years ago

Not coincidence, setting level to error makes a difference.

OK; then this could potentially be important for those having reset issues, as many of us has run debug and deactivated it later.

Just to be sure I understand: Even if both Serial and Telnet debug is deactivated, the "Debug level" setting can make a difference?

If so, it could maybe be useful in a future update to implement the following: When both Telnet and Serial are disabled: Set "Debug level" to "Error".

gskjold commented 2 years ago

Correct. And agree

ThomasEdvardsen commented 2 years ago

Have successfully used 2.0.1 for 3 days without rebooting. The first 2 days with debug level ERROR, the last day with debug level DEBUG. Both works as expected.

Will try to upgrade to 2.0.4 now.

ThomasEdvardsen commented 2 years ago

2.0.4 started rebooting from the beginning. Debug level is ERROR, and it reboots many times each hour.

image

gskjold commented 2 years ago

A summary of this thread for me: v2.0.0 and v2.0.1 is ok, v2.0.2 and above is not. Which means that the change must be between from v2.0.1 to v2.0.2. There is not much change between those two versions that could only affect a handful of people... https://github.com/gskjold/AmsToMqttBridge/compare/v2.0.1...v2.0.2

Attaching a firmware where I have downgraded Timelib from 1.6.1 to 1.6.0. I have my doubt it will change anything, but worth a try.

esp8266.zip esp32.zip

ThomasEdvardsen commented 2 years ago

I really appreciate your efforts to help me find the problem @gskjold!

I have managed to build and upload myself now.

In order not to waste your time, I will try to step up commits until I find out where the error was introduced.

At the moment I am running ff02dd4.

ArnieO commented 2 years ago

v2.0.4 running for two days with no issues.

gskjold commented 2 years ago

I have found a possible problem, attaching new firmware.

EDIT: Sorry, constantly attaching wrong file, adding new one! esp8266.zip

erlandp commented 2 years ago

I've been using this software with a nodemcu for about a week now. I've had frequent issues with 2.04 and 2.05. So far, fix 220122.2 has been running without hickups.

edit Unfortunately 220122.2 crashed aswell, just took a little longer. I'm now running 2.0.0.