gemu2015 / Sonoff-Tasmota

Tasmota Fork TCS34725,PN532_i2,ccc1101 Moritz support,m5stack 4,7 epaper, hotplug drivers
GNU General Public License v3.0
24 stars 19 forks source link

very frequent reboots due to exceptions while processing script #24

Closed bs-github closed 3 years ago

bs-github commented 3 years ago

PROBLEM DESCRIPTION

I'm using a script to read the ORNO OR-WE-516 3-phase energy meter with tasmota on a D1-mini via MODBUS RS485. It works in principle, but I have reboots every few seconds.

REQUESTED INFORMATION

00:00:00 CFG: Loaded from flash at F6, Count 271 00:00:00 QPC: Flag 7E 00:00:00 Script: nv=0, tv=0, vns=0, ram=16 00:00:00 SRC: Restart 00:00:00 Project tasmota Tasmota Version 8.3.1(sensors)-2_7_1 00:00:00 Script: performs "sensor53 r" 00:00:00 SRC: Rule 00:00:00 CMD: Group 0, Index 53, Command "SENSOR", Data "r" 00:00:00 SNS: Hardware Serial 00:00:00 RSL: stat/orno-smartmeter/RESULT = {"Time":"1970-01-01T00:00:00","SML":{"CMD":"restart"}} 00:00:00 SNS: Hardware Serial 00:00:00 WIF: Checking connection... 00:00:00 WIF: Attempting connection... 00:00:00 WIF: Connecting to AP1 adrion.netgaroo.com Channel 11 BSSId 98:9B:CB:A2:75:3A in mode 11N as smart-meter... 00:00:01 WIF: Checking connection... 00:00:01 WIF: Connected 00:00:01 HTP: Web server active on smart-meter with IP address 192.168.188.27 15:39:03 NTP: UTC 2020-07-28T14:39:02, DST 2020-03-29T02:00:00, STD 2020-10-25T03:00:00 15:39:04 RSL: tele/orno-smartmeter/INFO1 = {"Module":"Generic","Version":"8.3.1(sensors)","FallbackTopic":"cmnd/DVES_5A4696_fb/","GroupTopic":"cmnd/tasmotas/"} 15:39:04 RSL: tele/orno-smartmeter/INFO2 = {"WebServerMode":"Admin","Hostname":"smart-meter","IPAddress":"192.168.188.27"} 15:39:04 RSL: tele/orno-smartmeter/INFO3 = {"RestartReason":{"Exception":9,"Reason":"Exception","EPC":["4022e2bb","00000000","00000000"],"EXCVADDR":"3fff5533","DEPC":"00000000","CallChain":["4022e2b9","4023d5a3","4023d03c","4021439e","40238f2b","4024f8a4","4010106d"]}} 15:39:04 HTP: Information 15:39:06 HTP: Information 15:39:07 QPC: Reset 15:39:08 HTP: Information 15:39:08 RSL: tele/orno-smartmeter/STATE = {"Time":"2020-07-28T15:39:08","Uptime":"0T00:00:09","UptimeSec":9,"Heap":22,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"MqttCount":0,"Wifi":{"AP":1,"SSId":"adrion.netgaroo.com","BSSId":"98:9B:CB:A2:75:3A","Channel":11,"RSSI":76,"Signal":-62,"LinkCount":1,"Downtime":"0T00:00:03"}} 15:39:08 RSL: tele/orno-smartmeter/SENSOR = {"Time":"2020-07-28T15:39:08","ORNO516":{"GridFrequency":49.98,"Voltage_P1":226.30,"Voltage_P2":227.80,"Voltage_P3":227.00}} 15:39:09 HTP: Information 15:39:09 APP: Boot Count 202 15:39:09 CFG: Saved to flash at F5, Count 272, Bytes 4096 15:39:12 HTP: Main Menu 15:39:14 HTP: Console

Console output here:

00:00:00 CFG: Loaded from flash at F9, Count 268 00:00:00 QPC: Flag 7E 00:00:00 Script: nv=0, tv=0, vns=0, ram=16 00:00:00 SRC: Restart 00:00:00 Project tasmota Tasmota Version 8.3.1(sensors)-2_7_1 00:00:00 Script: performs "sensor53 r" 00:00:00 SRC: Rule 00:00:00 CMD: Group 0, Index 53, Command "SENSOR", Data "r" 00:00:00 SNS: Hardware Serial 00:00:00 RSL: stat/orno-smartmeter/RESULT = {"Time":"1970-01-01T00:00:00","SML":{"CMD":"restart"}} 00:00:00 SNS: Hardware Serial 00:00:00 WIF: Checking connection... 00:00:00 WIF: Attempting connection... 00:00:00 WIF: Connecting to AP1 adrion.netgaroo.com Channel 11 BSSId 98:9B:CB:A2:75:3A in mode 11N as smart-meter... 00:00:01 WIF: Checking connection... 00:00:01 WIF: Connected 00:00:01 HTP: Web server active on smart-meter with IP address 192.168.188.27 15:25:11 NTP: UTC 2020-07-28T14:25:10, DST 2020-03-29T02:00:00, STD 2020-10-25T03:00:00 15:25:12 MQT: Attempting connection... 15:25:12 MQT: Connected 15:25:12 MQT: tele/orno-smartmeter/LWT = Online (retained) 15:25:12 MQT: cmnd/orno-smartmeter/POWER = 15:25:12 MQT: Subscribe to cmnd/orno-smartmeter/# 15:25:12 MQT: Subscribe to cmnd/tasmotas/# 15:25:12 MQT: Subscribe to cmnd/DVES_5A4696_fb/# 15:25:12 MQT: tele/orno-smartmeter/INFO1 = {"Module":"Generic","Version":"8.3.1(sensors)","FallbackTopic":"cmnd/DVES_5A4696_fb/","GroupTopic":"cmnd/tasmotas/"} 15:25:12 MQT: tele/orno-smartmeter/INFO2 = {"WebServerMode":"Admin","Hostname":"smart-meter","IPAddress":"192.168.188.27"} 15:25:12 MQT: tele/orno-smartmeter/INFO3 = {"RestartReason":{"Exception":9,"Reason":"Exception","EPC":["4022e2bd","00000000","00000000"],"EXCVADDR":"33502071","DEPC":"00000000","CallChain":["4022e2b9","4023d5a3","4023d03c","4021439e","40238f2b","4024f8a4","4010106d"]}} 15:25:15 QPC: Reset 15:25:16 MQT: tele/orno-smartmeter/STATE = {"Time":"2020-07-28T15:25:16","Uptime":"0T00:00:09","UptimeSec":9,"Heap":22,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"MqttCount":1,"Wifi":{"AP":1,"SSId":"adrion.netgaroo.com","BSSId":"98:9B:CB:A2:75:3A","Channel":11,"RSSI":76,"Signal":-62,"LinkCount":1,"Downtime":"0T00:00:03"}} 15:25:16 MQT: tele/orno-smartmeter/SENSOR = {"Time":"2020-07-28T15:25:16","ORNO516":{"GridFrequency":49.99,"Voltage_P1":224.50,"Voltage_P2":226.20,"Voltage_P3":225.50}} 15:25:17 APP: Boot Count 200 15:25:17 CFG: Saved to flash at F8, Count 269, Bytes 4096


### TO REPRODUCE
This might be difficult for others without the hardware, but I assume the issue might not even be the hardware.

### EXPECTED BEHAVIOUR
No exceptions, no reboots, just meter readings.

### SCREENSHOTS
good case

ORNO516 Grid Frequency: 50.00 Hz ORNO516 Voltage P1: 226.00 V ORNO516 Voltage P2: 227.00 V ORNO516 Voltage P3: 227.00 V


### DATA DUMP ###
I think this is a good case (not the data that causes the reboot)

14:34:26 : 01 03 04 42 48 14 7b 21 7e 14:34:26 : 01 03 04 43 61 00 00 be 69 14:34:26 : 01 03 04 43 62 00 00 4e 69 14:34:27 : 01 03 04 43 61 e6 66 74 14:34:27 : 23 14:34:27 : 01 03 04 42 48 14 7b 14:34:27 : 21 7e 14:34:27 : 01 03 04 43 61 00 00 be 14:34:27 : 69 14:34:27 : 01 03 04 43 61 19 9a 35 14:34:27 : 92 14:34:27 : 01 03 04 43 62 19 9a c5 92 14:34:27 : 01 03 04 43 61 e6 66 74 23 14:34:27 : 01 03 04 42 48 1e b8 67 8f 14:34:28 : 01 03 04 43 61 19 9a 35 92 14:34:28 : 01 03 04 43 62 19 9a c5 92 14:34:28 : 01 03 04 43 61 e6 66 14:34:28 : 74 23 14:34:28 : 01 03 04 43 62 00 00 14:34:28 : 4e 69 14:34:28 : 01 03 04 42 48 1e b8 67 8f



### ADDITIONAL CONTEXT
I think the problem might be in the code that processes the data from the meter/modbus/serial.

I do see valid results every time. But I see data missing often as well (zero values for where should be higher meter readings). 
It might be that the rs485 line is not very clear and there might come in wrong values? Or is that ruled out with a checksum (not to my knowledge)?
Bad data that could not be parsed correctly would explain that there are missing values.

The reboot problem vanishes when I turn on the debug dump (CMD: sensor53 d1), but then I do not get the desired data obviously. The data as well as the reboots immediately come back when I turn the dump off again (CMD: sensor53 d0). 
So as long as the data is not processed (but fetched) the WeMo D1-mini and tasmota work reliable, but as soon as the data is processed (for web display?) the exception occurs.

Side note: sometimes the watchdog initiate the restart.

What can I do to debug this?
gemu2015 commented 3 years ago

your descriptor looks OK you should start with a minimal version to trace down the bug. use a single value at first

the only thing in doubt you use only 100ms for value transmissions. For some meters this is too short. try with larger values eg 300 ms = 3 +1,3,M,0,9600,ORNO516,1,3,01030014,0103000E,01030010,01030012

however still strange that this should cause the failure

transmissions and received data are checksummed so only valid values are processed.

there are many users using this driver with modbus devices and no issues so i doubt there is a general software issue.

gemu2015 commented 3 years ago

did you solve the issue ?

bs-github commented 3 years ago

I tried with larger values 3,5,7 and even 12 for value transmissions and the restarts/exceptions do happen less frequent with higher values, and do not occur at 12 anymore. I still do think there is something that can be done in the code to prevent the restarts. How would you explain that there are no exceptions while just dumping the received values instead of processing them?

gemu2015 commented 3 years ago

i checked the code and found that with a very faulty serial interface a buffer overrun could occur. this is now fixed. with working serial connection i never saw an exception in my own sdm530 for month.

you should check your hardware serial to rs485 interface.

bs-github commented 3 years ago

How do I check that hardware? And how can I be sure that the data is faulty in the first place? I assume I do not see any faulty data (just no data) due to the way the checksum thingy works.

bs-github commented 3 years ago

I do have quite some values read now by the script. But I'm not finished yet. I'll just put that here for completeness/reference, albeit my scripting issues should be discussed somewhere else (new issue / chat?).

>D
oldFL1=0
;Fill vars with content on teleperiod
>T
oldFL1=#L1ForwardActiveEnergy
>B
=>sensor53 r
;Set teleperiod to 10sec
tper=10
>S
>M 1
+1,3,M,0,9600,ORNO516,1,12,0103010A,0103010C,0103010E,01030112,01030114,01030116,01030014,0103000E,01030010,01030012,0103001C,0103001E,01030020,01030022,01030102,01030104,01030106,01030100
1,010304ffffffff@i0:1,L1 Forward Active Energy,kWh,L1ForwardActiveEnergy,2
1,010304ffffffff@i1:1,L2 Forward Active Energy,kWh,L2ForwardActiveEnergy,2
1,010304ffffffff@i2:1,L3 Forward Active Energy,kWh,L3ForwardActiveEnergy,2
1,010304ffffffff@i3:1,L1 Reverse Active Energy,kWh,L1ReverseActiveEnergy,2
1,010304ffffffff@i4:1,L2 Reverse Active Energy,kWh,L2ReverseActiveEnergy,2
1,010304ffffffff@i5:1,L3 Reverse Active Energy,kWh,L3ReverseActiveEnergy,2
1,=h==================
1,=d 1 60 @0.01,Delta FAEP1,W,DFAEP1,2
1,=d 2 60 @0.01,Delta FAEP2,W,DFAEP2,2
1,=d 3 60 @0.01,Delta FAEP3,W,DFAEP3,2
1,=h==================
1,=d 4 60 @0.01,Delta RAEP1,W,DRAEP1,2
1,=d 5 60 @0.01,Delta RAEP2,W,DRAEP2,2
1,=d 6 60 @0.01,Delta RAEP3,W,DRAEP3,2
1,=h==================
1,010304ffffffff@i6:1,Grid Frequency,Hz,GridFrequency,2
1,010304ffffffff@i7:1,L1 Voltage,V,L1Voltage,2
1,010304ffffffff@i8:1,L2 Voltage,V,L2Voltage,2
1,010304ffffffff@i9:1,L3 Voltage,V,L3Voltage,2
1,010304ffffffff@i10:0.01,Total Active Power,W,TotalActivePower,2
1,010304ffffffff@i11:0.01,L1 Active Power,W,L1ActivePower,2
1,010304ffffffff@i12:0.01,L2 Active Power,W,L2ActivePower,2
1,010304ffffffff@i13:0.01,L3 Active Power,W,L3ActivePower,2
1,010304ffffffff@i14:1,Total Energy,kWh,TotalEnergy,2
1,010304ffffffff@i15:1,L1 Total Energy,kWh,L1TotalEnergy,2
1,010304ffffffff@i16:1,L2 Total Energy,kWh,L2TotalEnergy,2
1,010304ffffffff@i17:1,L3 Total Energy,kWh,L3TotalEnergy,2
1,=h==================
1,=d 15 60 @1,Total Active Energy d,W,DTAE,2
#

The values for the delta calculations doesn't seem right as of now, but I have to look into this a bit more to understand it.

I think oldFL1=#L1ForwardActiveEnergy does not work as intended (so I'm not sure how to access the values read from the meter within the script) and my delta definitions (like 1,=d 15 60 @1,Total Active Energy d,W,DTAE,2) are probably wrong as well.

I'd like to play around with the delta thing and probably with calculating some values in >T or >S to finally get to the point where I can tell for sure if the energy is actually consumed or generated. The meter itself can count forward and reverse energy, but Active Power is always positive, even when the energy is flowing out.

gemu2015 commented 3 years ago
  1. do you still get exceptions ?
  2. did you set SML_MAX_VARS to a higher value ? default is 20. you use more then 20 lines in your descriptor (html only lines dont count) this also leads to failure.
  3. you dont need to get vars via JSON. you may use sml[x] to access sml descriptor lines anyhow your syntax is wrong, this is the right one oldFL1=ORNO516#L1ForwardActiveEnergy

i can recommend this rs485 interface. it works perfect https://www.ebay.de/itm/TTL-RS485-Adapter-485-UART-Seriell-3-3V-5-Volt-Level-Konverter-Modul-Arduino/252852473229?hash=item3adf2e8d8d:g:DQwAAOSw5UFb4H25

bs-github commented 3 years ago
  1. do you still get exceptions ?

No, not with +1,3,M,0,9600,ORNO516,1,12,...

  1. did you set SML_MAX_VARS to a higher value ? default is 20. you use more then 20 lines in your descriptor (html only lines dont count) this also leads to failure.

yes, I did #define SML_MAX_VARS 25 in tasmota/user_config_override.h

  1. you dont need to get vars via JSON. you may use sml[x] to access sml descriptor lines anyhow your syntax is wrong, this is the right one oldFL1=ORNO516#L1ForwardActiveEnergy

Thanks, I've got values in the script now.

i can recommend this rs485 interface. it works perfect https://www.ebay.de/itm/TTL-RS485-Adapter-485-UART-Seriell-3-3V-5-Volt-Level-Konverter-Modul-Arduino/252852473229?hash=item3adf2e8d8d:g:DQwAAOSw5UFb4H25

That's what I have (from a different seller).

bs-github commented 3 years ago

Hello,

  1. do you still get exceptions ?

No, not with +1,3,M,0,9600,ORNO516,1,12,...

I would be able to get back to smaller values if you want me to test something. But I do not have to solve this issue because I get the data that I need reliably now.

  1. you dont need to get vars via JSON. you may use sml[x] to access sml descriptor lines anyhow your syntax is wrong, this is the right one oldFL1=ORNO516#L1ForwardActiveEnergy

Thanks, I've got values in the script now.

I could not get the values with sml[x] and the delta definitions did not work for me either, but those drawbacks are no issue for me and I'm quite happy with the following version of my script for the ORNO or-we-516 smart meter:

>D
;now
F1=0
F2=0
F3=0
R1=0
R2=0
R3=0
;before
f1=0
f2=0
f3=0
r1=0
r2=0
r3=0
;in or out
io1=1
io2=1
io3=1
;active power
A1=0
A2=0
A3=0
a1=0
a2=0
a3=0
;Fill vars with content on teleperiod
>T
F1=ORNO516#L1ForwardActiveEnergy
F2=ORNO516#L2ForwardActiveEnergy
F3=ORNO516#L3ForwardActiveEnergy
R1=ORNO516#L1ReverseActiveEnergy
R2=ORNO516#L2ReverseActiveEnergy
R3=ORNO516#L3ReverseActiveEnergy
A1=ORNO516#L1ActivePower
A2=ORNO516#L2ActivePower
A3=ORNO516#L3ActivePower
if R1>r1
then
r1=R1
io1=-1
endif
if R2>r2
then
r2=R2
io2=-1
endif
if R3>r3
then
r3=R3
io3=-1
endif
if F1>f1
then
f1=F1
io1=1
endif
if F2>f2
then
f2=F2
io2=1
endif
if F3>f3
then
f3=F3
io3=1
endif
;active power with sign
a1=io1*A1
a2=io2*A2
a3=io3*A3
;print F %F1% %F2% %F3% f %f1% %f2% %f3% R %R1% %R2% %R3% r %r1% %r2% %r3% a %a1% %a2% %a3% A %A1% %A2% %A3% i %io1% %io2% %io3%

>B
=>sensor53 r
;Set teleperiod to 10sec
tper=10

>J
,"CALCULATED":{"io_L1":%io1%,"io_L2":%io2%,"io_L3":%io3%
,"L1ActivePowerSigned":%a1%,"L2ActivePowerSigned":%a2%,"L3ActivePowerSigned":%a3%}

>W
CALCULATED
L1 Active Energy: {m} %a1% W
L2 Active Energy: {m} %a2% W
L3 Active Energy: {m} %a3% W

>M 1
+1,3,M,0,9600,ORNO516,1,12,01030014,0103000E,01030010,01030012,0103010A,01030112,01030102,0103010C,01030114,01030104,0103010E,01030116,01030106,01030100,0103001E,01030020,01030022,0103001C
1,010304ffffffff@i0:1,GridFrequency,Hz,GridFrequency,2
1,010304ffffffff@i1:1,L1Voltage,V,L1Voltage,2
1,010304ffffffff@i2:1,L2Voltage,V,L2Voltage,2
1,010304ffffffff@i3:1,L3Voltage,V,L3Voltage,2
1,010304ffffffff@i4:1,L1ForwardActiveEnergy,kWh,L1ForwardActiveEnergy,2
1,010304ffffffff@i5:1,L1ReverseActiveEnergy,kWh,L1ReverseActiveEnergy,2
1,010304ffffffff@i6:1,L1TotalEnergy,kWh,L1TotalEnergy,2
1,010304ffffffff@i7:1,L2ForwardActiveEnergy,kWh,L2ForwardActiveEnergy,2
1,010304ffffffff@i8:1,L2ReverseActiveEnergy,kWh,L2ReverseActiveEnergy,2
1,010304ffffffff@i9:1,L2TotalEnergy,kWh,L2TotalEnergy,2
1,010304ffffffff@i10:1,L3ForwardActiveEnergy,kWh,L3ForwardActiveEnergy,2
1,010304ffffffff@i11:1,L3ReverseActiveEnergy,kWh,L3ReverseActiveEnergy,2
1,010304ffffffff@i12:1,L3TotalEnergy,kWh,L3TotalEnergy,2
1,010304ffffffff@i13:1,TotalEnergy,kWh,TotalEnergy,2
1,010304ffffffff@i14:0.001,L1ActivePower,W,L1ActivePower,2
1,010304ffffffff@i15:0.001,L2ActivePower,W,L2ActivePower,2
1,010304ffffffff@i16:0.001,L3ActivePower,W,L3ActivePower,2
1,010304ffffffff@i17:0.001,TotalActivePower,W,TotalActivePower,2
1,=m 15+16+17 @1,Power L1+L2+L3,W,CalcTotActPow,2
#

There are things in there that are not really needed, but if someone wants to play around with this, I thought deleting things is more easy than coming up with more ;-).

For someone who wants to use this, I've compiled my tasmota firmware with the following in tasmota/user_config_override.h:

#define USE_PROMETHEUS
#define USE_ADC_VCC

#ifndef USE_SCRIPT
#define USE_SCRIPT  # adds about 17k flash size, variable ram size
#endif
#ifndef USE_SML_M
#define USE_SML_M
#endif
#ifndef USE_SCRIPT_WEB_DISPLAY
#define USE_SCRIPT_WEB_DISPLAY
#endif
#ifndef USE_SCRIPT_JSON_EXPORT
#define USE_SCRIPT_JSON_EXPORT
#endif
#ifdef USE_RULES
#undef USE_RULES
#endif
// see bellow instructions to set the value N accordingly
//SML_MAX_VARS default 20
#define SML_MAX_VARS 35
//SML_BSIZ serial buffers default 48
#define SML_BSIZ 48

Thanks for the help and all the work done!

bs-github commented 3 years ago

In case someone stumbles over this, I'm using https://github.com/hikhvar/mqtt2prometheus/ to get the data into prometheus and grafana.

gemu2015 commented 3 years ago

i forgot to mention, to use sml[x] you must

define USE_SML_SCRIPT_CMD

mdiazgoncalves commented 2 years ago

Hi @bs-github

In your script, 1,=m 15+16+17 @1,Power L1+L2+L3,W,CalcTotActPow,2 should be 1,=m 14+15+16 @1,Power L1+L2+L3,W,CalcTotActPow,2 for the sum of all active powers right?