arendst / Tasmota

Alternative firmware for ESP8266 and ESP32 based devices with easy configuration using webUI, OTA updates, automation using timers or rules, expandability and entirely local control over MQTT, HTTP, Serial or KNX. Full documentation at
https://tasmota.github.io/docs
GNU General Public License v3.0
21.97k stars 4.77k forks source link

Domoticz: Since v10.0.0 devices randomly get corrupted timezone (+104) and MQTT gets disabled #13700

Closed sammyke007 closed 2 years ago

sammyke007 commented 2 years ago

PROBLEM DESCRIPTION

My devices (10 Sonoff Mini, MiniR2 and BasicR3) randomly get their MQTT disabled. The console spams

04:13:27.880 RSL: STATE = {"Time":"2021-11-21T04:13:27","Uptime":"5T02:48:59","UptimeSec":442139,"Heap":16,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"MqttCount":8,"POWER":"ON","Wifi":{"AP":1,"SSId":"Wifi13","BSSId":"04:D4:C4:B4:A2:98","Channel":1,"Mode":"11n","RSSI":62,"Signal":-69,"LinkCount":6,"Downtime":"0T00:01:40"}}
04:13:27.885 RSL: SENSOR = {"Time":"2021-11-21T04:13:27","Switch1":"ON"}

every second. Timezone99 was my setting, but upon checking, the time is off and timezone gives result +104 instead of 99. As you can see below in the requested information, time says 04:14 instead of 21:14 (UTC+1 in Belgium)...

The following commands work OK, but after some time, the problem acts up again...

Reset 2

Backlog template {"NAME":"Sonoff Basic R3","GPIO":[32,1,1,1,1,0,0,0,224,320,1,0,160,0],"FLAG":0,"BASE":1}; module 0

Backlog Timezone 99; Latitude 51.212935; Longitude 4.284476; DzIdx1 43; ipaddress1 192.16.1.143; FriendlyName Tasmota badkamer spiegel; DeviceName Tasmota badkamer spiegel .143; MqttHost 192.168.1.*****; MqttUser ********; MqttPassword *****

REQUESTED INFORMATION

Make sure your have performed every step and checked the applicable boxes before submitting your issue. Thank you!

04:14:38.327 RSL: RESULT = {"NAME":"Sonoff Basic R3","GPIO":[57344,57599,57599,57599,255,0,0,57344,16384,57345,255,40960,0,0],"FLAG":0,"BASE":1} 04:14:38.338 RSL: RESULT = {"Module":{"0":"Sonoff Basic R3"}} 04:14:38.403 RSL: RESULT = {"GPIO0":{"57344":""},"GPIO1":{"57599":""},"GPIO2":{"57599":""},"GPIO3":{"57599":""},"GPIO4":{"255":"Relay32"},"GPIO5":{"0":"None"},"GPIO9":{"0":"None"},"GPIO10":{"57344":""},"GPIO12":{"16384":""},"GPIO13":{"57345":""},"GPIO14":{"255":"Relay32"},"GPIO15":{"40960":""},"GPIO16":{"0":"None"},"GPIO17":{"0":"None"}}

- [ ] If using rules, provide the output of this command: `Backlog Rule1; Rule2; Rule3`:
```lua
  Rules output here:

04:17:08.185 RSL: RESULT = {"Rule1":{"State":"OFF","Once":"OFF","StopOnError":"OFF","Length":105,"Free":406,"Rules":"on button1#state==3 do publish domoticz/in {\"command\":\"switchlight\",\"idx\":146,\"switchcmd\":\"Toggle\"} endon"}}
04:17:08.192 RSL: RESULT = {"Rule2":{"State":"OFF","Once":"OFF","StopOnError":"OFF","Length":0,"Free":511,"Rules":""}}
04:17:08.243 RSL: RESULT = {"Rule3":{"State":"OFF","Once":"OFF","StopOnError":"OFF","Length":0,"Free":511,"Rules":""}}

04:17:38.741 RSL: STATUS = {"Status":{"Module":0,"DeviceName":"Tasmota badkamer spiegel .143","FriendlyName":["Tasmota badkamer spiegel"],"Topic":"tasmota_BadkamerSpiegel","ButtonTopic":"0","Power":1,"PowerOnState":255,"LedState":20,"LedMask":"FFFF","SaveData":25344,"SaveState":0,"SwitchTopic":"0","SwitchMode":[5,0,0,0,0,0,0,0],"ButtonRetain":0,"SwitchRetain":1,"SensorRetain":0,"PowerRetain":0,"InfoRetain":0,"StateRetain":0}} 04:17:38.747 RSL: STATUS1 = {"StatusPRM":{"Baudrate":115200,"SerialConfig":"8N1","GroupTopic":"tasmotas","OtaUrl":"ttp://ota.tasmota.com/tasmota/release/tasmota.bin.gz","RestartReason":"Software/System restart","Uptime":"5T02:53:10","StartupUTC":"2021-11-11T17:24:28","Sleep":50,"CfgHolder":18,"BootCount":20992,"BCResetTime":"2020-08-13T23:47:04","SaveCount":8,"SaveAddress":"F7000"}} 04:17:38.752 RSL: STATUS2 = {"StatusFWR":{"Version":"10.0.0(tasmota)","BuildDateTime":"2021-10-19T08:23:49","Boot":31,"Core":"2_7_4_9","SDK":"2.2.2-dev(38a443e)","CpuFrequency":80,"Hardware":"ESP8285","CR":"410/699"}} 04:17:38.757 RSL: STATUS3 = {"StatusLOG":{"SerialLog":2,"WebLog":2,"MqttLog":0,"SysLog":0,"LogHost":"","LogPort":514,"SSId":["Wifi13",""],"TelePeriod":1,"Resolution":"558180C0","SetOption":["01000080","05C8000100060000005A0A0000000000003C","05000000","00006000","00004000"]}} 04:17:38.766 RSL: STATUS4 = {"StatusMEM":{"ProgramSize":612,"Free":388,"Heap":16,"ProgramFlashSize":1024,"FlashSize":1024,"FlashChipId":"144051","FlashFrequency":40,"FlashMode":3,"Features":["00000809","8FDAC787","04368001","000000CF","010013C0","C000F981","00004004","00001000","00000020"],"Drivers":"1,2,3,4,5,6,7,8,9,10,12,16,18,19,20,21,22,24,26,27,29,30,35,37,45","Sensors":"1,2,3,4,5,6"}} 04:17:38.772 RSL: STATUS5 = {"StatusNET":{"Hostname":"tasmota-BadkamerSpiegel-2441","IPAddress":"192.168.1.143","Gateway":"192.168.1.1","Subnetmask":"255.255.255.0","DNSServer1":"192.168.1.1","DNSServer2":"0.0.0.0","Mac":"DC:4F:22:AB:C9:89","Webserver":2,"HTTP_API":1,"WifiConfig":4,"WifiPower":17.0}} 04:17:38.777 RSL: STATUS7 = {"StatusTIM":{"UTC":"2021-11-16T20:17:38","Local":"2021-11-21T04:17:38","StartDST":"2021-01-30T14:00:00","EndDST":"2021-01-23T06:00:00","Timezone":"+104:0","Sunrise":"15:09","Sunset":"23:46"}} 04:17:38.782 RSL: STATUS10 = {"StatusSNS":{"Time":"2021-11-21T04:17:38","Switch1":"ON"}} 04:17:38.787 RSL: STATUS11 = {"StatusSTS":{"Time":"2021-11-21T04:17:38","Uptime":"5T02:53:10","UptimeSec":442390,"Heap":16,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"MqttCount":8,"POWER":"ON","Wifi":{"AP":1,"SSId":"Wifi13","BSSId":"04:D4:C4:B4:A2:98","Channel":1,"Mode":"11n","RSSI":58,"Signal":-71,"LinkCount":6,"Downtime":"0T00:01:40"}}}

- [ ] Set `weblog` to 4 and then, when you experience your issue, provide the output of the Console log:
```lua
  Console output here:

TO REPRODUCE

Happens randomly after half a day, 2 days, ...

EXPECTED BEHAVIOUR

Keep MQTT enabled and keep correct timezone. (I guess time difference gets the MQTT disabled?)

SCREENSHOTS

/

ADDITIONAL CONTEXT

These 10 devices have been running rock solid on 9.5.0 and before. Never had this problem before and it started giving this problem right after updating to v10.0.0. I already tried a complete "Reset 2" and configuring my device as new, but after some days it start's again (MQTT suddenly disabled, timezone off and console message spam every second).

(Please, remember to close the issue when the problem has been addressed)

sfromis commented 2 years ago

I've got many devices running version 10 without any resetting of Timezone from my standard setting of 99. I'd be surprised if time difference would be a reason for MQTT getting disabled.

It looks like the settings of your device is broken, not just the weird "Timezone":"+104:0" but especially also the template with very strange component numbers. Not sure what could cause this. Losing MQTT is probably just a result of broken settings structure.

sammyke007 commented 2 years ago

I tried a full reset offcourse before creating an issue. The "Reset 2" + the 2 above backlogs is how I configure the device after reset. Nothing else. After some time, boom, bootcount totally off, timezone wrong, MQTT disabled and spam in the console. MQTT disabled is how I get alerted, because Domoticz can't control it anymore. It happens on all of these 10 devices as long as they are on V10. No issues on 9.5.0. That's the weird thing.

Jason2866 commented 2 years ago

Can you erase the flash from a device and flash v.10. via serial?

barbudor commented 2 years ago

You should track this bootcount & possible reboots It may means that your devices are rebooting a lot. This plus strange values could be a problem with the flash chips. If possible, I would revert a few units back to 9.5.0 as a control group

My ESP8266 devices were still on 9.5.0. (only ESP32 were on 10.0.0.x). I just move a few of them to 10.0.0.3 to see that makes any difference

sammyke007 commented 2 years ago

Can you erase the flash from a device and flash v.10. via serial?

Most of them are build behind wall switches or behind a closet. I'll flash a new basic R3 with 10.0.0 and see what it gives. They were all flashed initially with the DiY tool, from there on always OTA upgrades. As said, all of these 10 devices worked rock solid on 9.5.0. MQTT never got disabled by itself.

sammyke007 commented 2 years ago

You should track this bootcount & possible reboots It may means that your devices are rebooting a lot. This plus strange values could be a problem with the flash chips. If possible, I would revert a few units back to 9.5.0 as a control group

My ESP8266 devices were still on 9.5.0. (only ESP32 were on 10.0.0.x). I just move a few of them to 10.0.0.3 to see that makes any difference

The very high boot count was also corrupt above. After a restart it's back normal and it was 31 or so. The device was online for 5 days btw.

sammyke007 commented 2 years ago

And another one of my devices got this crazy corruption. Bootcount crazy, MQTT disabled, console message spam every second and other corrupted settings. I flashed 5 out of 10 devices back to 9.5.0 and they seem to be stable again (except some Exception 9 reboots once in a while, but they reboot and work fine, nothing gets corrupted).

I installed 10.0.0.3 on a test sonoffMiniR2 and I'll check what it gives.

sfromis commented 2 years ago

Speculatively, I'm wondering if your issue is somehow related to using Domoticz, which AFAIK is not a very commonly used backend for Tasmota.

sammyke007 commented 2 years ago

Speculatively, I'm wondering if your issue is somehow related to using Domoticz, which AFAIK is not a very commonly used backend for Tasmota.

My testdevice isn't configured with domoticz parameters. I'll see what it gives (also against the exception 9 error they all give once a day).

sfromis commented 2 years ago

Your posted config includes Domoticz: DzIdx1 43

sammyke007 commented 2 years ago

I know, but on my testdevice I installed this morning, I left that out

ascillato2 commented 2 years ago

Hi, any news on this?

Have you tried to erase and flash again with Tasmotizer?

sammyke007 commented 2 years ago

Test device is still OK and MQTT still enabled, but just an hour ago, a Wemos D1 with 10.0.0 got MQTT disabled. Timezone 104, bootcount 55k +, ...

ascillato commented 2 years ago

how are you flashing these devices? are you using Tasmotizer?

Can you try with latest version 10.0.0.3? ( http://ota.tasmota.com/tasmota/tasmota.bin.gz )

sammyke007 commented 2 years ago

For the easy accessible ones (serial) (Wemos, NodeMCU, ...), I indeed always use the latest Tasmotizer. For the Sonoff's, I use the DIY tool. But both seem to be affected. My test device (MiniR2) is already on 10.0.0.3 and it's been running fine for +- 2 days. I'll update the Wemos that got corrupt too with 10.0.0.3 too, just to make sure. As this happens on one of my 10 (+1 test) devices once every +- 24h, it's not always easy to pinpoint any other exact reason. Only thing that I'm sure off, is that it hasn't been doing this on 9.5.0.

Edit: my test device now runs on 10.0.0.3 instead of 10.0.0. Syslog set to 4 and being logged to my NAS. I hope it corrupts soon :-D

Jason2866 commented 2 years ago

Do you have new devices to test too? Looking at your BootCount there is maybe the flash chip defect. If new devices does work without a problem you have a hardware issue. Have you googled with the Flash Chip Id from the Information page which manufactorer your flash chip is? There are some with very bad lifetimes.

`

sammyke007 commented 2 years ago

The bootcount is perfectly fine on all devices. It only gets corrupt / very high when it goes crazy. Like I said, the following things occur:

Restarting the module from console fixes it immediately.

I've got Sonoff MiniR2's that are max. 3 months old, but also Wemos D1, NodeMCU and other Sonoff's that are older (1 year or older). They all seem to show this very weird crash / behavior.

barbudor commented 2 years ago

Just to confirm that none of the units I moved to 10.0.0.3 earlier this week have shown the problem. Still connected on MQTT and expected Timezone 99

ascillato commented 2 years ago

They all seem to show this very weird crash / behavior.

This is super weird. Do you have any magnetic interference close to your devices like antennae or similar?

sammyke007 commented 2 years ago

I know...

I have nothing special in my home. I do run however a ZigBee network of +- 30 devices, but my 2,4 WiFi is on channel 1 and ZigBee is on the highest band. Like I said, never had this behaviour before on previous firmwares. Nothing changed in my house at the time of upgrading to 10.0.0

I'll keep logging as much as I can. I have 4 devices on 9.5.0, 3 on 10.0.0 and 2 + 1 test device on 10.0.0.3

ascillato commented 2 years ago

you said:

Weblog spams RSL messages every second

what messages do you have? can you paste here a log of several of those? Can you tried using WEBLOG 4 to catch debug messages too?

and what is the Flash Chip Id in the Information menu in Tasmota WebUI ?

sammyke007 commented 2 years ago

you said:

Weblog spams RSL messages every second

what messages do you have?

and what is the Flash Chip Id in the Information menu in Tasmota WebUI ?

The RSL and STATE message in the first message above. (04:13:27....)

Do you want all 11 IDs? My Wemos D1 that crashed today is Flash Chip Id0x164020

ascillato commented 2 years ago

can you paste here a log of several of those? Can you tried using WEBLOG 4 to catch debug messages too?

and what is the Flash Chip Id in the Information menu in Tasmota WebUI ?

sammyke007 commented 2 years ago

Syslog 4 is now active on 3 devices. Waiting for the error / crash to show up...

ascillato2 commented 2 years ago

What is the FLASH CHIP ID from the Information page on the failing devices?

this value:

image

sammyke007 commented 2 years ago

Wemos d1 on 10.0.0 that went crazy today:

ESP Chip Id6837606 (ESP8266EX) Flash Chip Id0x164020

Sonoff Basic R3 on 10.0.0 that went crazy 2 days ago:

ESP Chip Id11258249 (ESP8285) Flash Chip Id0x144051

Jason2866 commented 2 years ago

The 0x164020 is a XMC flash chip. In our experience it is a troublemaker and do fail more often than other flash chips very early. The ESP8285 has embedded flash and until now we do not had users with higher fail rates. Have you clicked erase flash in Tasmotizer when you are flashing? Which USB - serial adapter do you use?

sammyke007 commented 2 years ago

No, I don't think I've erased it before. But like I said, everything was "rock solid" before jumping to 10.0.0. I use a USB to TTL CP2102 module to flash the Wemos and NodeMCU devices. Itead DIY tool for the Sonoff devices.

barbudor commented 2 years ago

What is definitively strange is that you get the same weird behavior on devices that are totally different which, IMHO, would clear a flash self-corruption. The only common point I seem to see is usage of Domoticz. It could be a problem in Domoticz driver which corrupts settings, however there is no obvious item in the changelog related to Domoticz.

sammyke007 commented 2 years ago

That might be a possible cause. Most of the device crash once in a while with an exception 9. I don't think anything changed related to Domoticz between 9.5.0 and 10.0.0

Jason2866 commented 2 years ago

Mhh, Theo uses Domoticz. If there would be a problem, it would be solved already.

barbudor commented 2 years ago

@sammyke007 Can you provide us the minimum we would need to reproduce your setup ? any override for custom build, settings & config ... etc ? (except credentials of course) I don't see any other way.

Am I correct to assume I don't need an actual Domonitcz server to test as everything goes through MQTT ?

sammyke007 commented 2 years ago

@sammyke007 Can you provide us the minimum we would need to reproduce your setup ? any override for custom build, settings & config ... etc ? (except credentials of course) I don't see any other way.

Am I correct to assume I don't need an actual Domonitcz server to test as everything goes through MQTT ?

I don't think you'll need it. Do you have a Sonoff Mini R2 to test on? This is one of my example/crashing devices:

Backlog template {"NAME":"Sonoff MINIR2","GPIO":[17,0,0,0,9,0,0,0,21,56,0,0,0],"FLAG":0,"BASE":1}; module 0

Backlog Timezone 99; Latitude 51.212935; Longitude 4.284476; DzIdx1 385; Switchmode1 8; ipaddress1 192.16.1.221; FriendlyName Tasmota SLK Ouders; DeviceName Tasmota SLK Ouders .221; MqttHost 192.168.1.19; MqttUser *****; MqttPassword *****
barbudor commented 2 years ago

I can probably simulate on a nodemcu. Standard Tasmota.bin build ?

sammyke007 commented 2 years ago

Yes, latest 10.0.0.3. My exception 9 errors still occur sometimes on 10.0.0.3, but I THINK that MQTT wasn't disabled anymore on 10.0.0.3. Nothing stopped or crashed today (yet). I'll keep logging. Tnx everyone for all the feedback and followup.

sammyke007 commented 2 years ago

FYI:

image

image

Jason2866 commented 2 years ago

It is not related to v.10 you have exceptions with v.9.5 too.

sammyke007 commented 2 years ago

Yeah, the exceptions might not be related to the MQTT disabled error. On 9.5.0 I wasn't watching my devices as much as I'm doing now. MQTT being disabled is something I get alerted by rather fast as my rules are not working anymore or my connection to Domoticz stops. The exceptions were probably there before on 9.5.0 too indeed. MQTT being disabled "by itself" was 100% sure not.

Jason2866 commented 2 years ago

Please provide the mqtt setup of a device which fails. Maybe you have choosen a not valid config.

sammyke007 commented 2 years ago

What do you mean? They all connect to the same broker (latest mosquitto, was 1.5.7 before, I upgraded to exclude the broker). They all use the same username and password.

image

I only configurate MqttHost, MqttUser and MqttPassword

Jason2866 commented 2 years ago

Your mqtt topic. You can post a screenshot of the info webpage. See example Screenshot_20211120-224946

sammyke007 commented 2 years ago

This is the Wemos D1 that corrupted / MQTT got disabled yesterday. I upgraded it after that from 10.0.0 to 10.0.0.3:

image

Jason2866 commented 2 years ago

Correct too. Out of ideas.

sammyke007 commented 2 years ago

Yeah... It's not that I'm a complete noob. I checked and tried a lot before posting the issue here. I'll see if it keeps corrupting for now. I'm starting to feel sorry for not finding a stupid mistake made by myself ;-)

Jason2866 commented 2 years ago

It does not look like a easy issue. No reason to excuse.

barbudor commented 2 years ago

So I have a nodemcu running under monitoring with your configuration for the last 16 hours. I just added a rule to toggle power every 2 minutes to insure there is some Domoticz activity.

sammyke007 commented 2 years ago

Ok update:

My test Sonoff Mini R2 has been running without a single reboot for 2 days and 22 hours. I now added the command dzidx1 385 to "connect it" to domoticz. I'll see what this gives.

EDIT: already 1 reboot after 10 minutes. Settings preserved however.

One of my devices that really toggles idx 385 in domoticz, constantly logs every Domoticz idx that gets updated with weblog or syslog set to "4". This is normal behavior I guess?

image

barbudor commented 2 years ago

My Nodemcu has been running 36 H now without a problem including you DzIdx 385 Typical 2 minute log:

12:55:18.427 MQT: tele/nodemcu/STATE = {"Time":"2021-11-22T12:55:18","Uptime":"1T16:05:11","UptimeSec":144311,"Heap":26,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"MqttCount":1,"POWER":"OFF","Wifi":{"AP":1,"SSId":"BNETWLAN","BSSId":"88:C3:97:F2:25:EC","Channel":1,"Mode":"11n","RSSI":82,"Signal":-59,"LinkCount":1,"Downtime":"0T00:00:03"}}
12:55:18.453 MQT: tele/nodemcu/SENSOR = {"Time":"2021-11-22T12:55:18","Switch1":"ON"}
12:56:00.435 RUL: TIME#MINUTE|2 performs "Power Toggle"
12:56:00.437 SRC: Rule
12:56:00.440 CMD: Grp 0, Cmnd 'POWER', Idx 1, Len 6, Data 'Toggle'
12:56:00.449 MQT: domoticz/in = {"idx":385,"nvalue":1,"svalue":"","Battery":100,"RSSI":8}
12:56:00.454 MQT: stat/nodemcu/RESULT = {"POWER":"ON"}
12:56:00.459 MQT: stat/nodemcu/POWER = ON
12:56:01.417 CFG: Saved to flash at F7, Count 781, Bytes 4096
12:58:00.431 RUL: TIME#MINUTE|2 performs "Power Toggle"
12:58:00.433 SRC: Rule
12:58:00.435 CMD: Grp 0, Cmnd 'POWER', Idx 1, Len 6, Data 'Toggle'
12:58:00.441 MQT: domoticz/in = {"idx":385,"nvalue":0,"svalue":"","Battery":100,"RSSI":7}
12:58:00.444 MQT: stat/nodemcu/RESULT = {"POWER":"OFF"}
12:58:00.448 MQT: stat/nodemcu/POWER = OFF
sammyke007 commented 2 years ago

I'm starting to get really desperate by now... One device just crashed again. MQTT was disabled, console spammed, ... I typed "restart 1" (not reset, 100% sure) and it lost all config. Back to Sonoff Basic template and WifiManager as everything seemed to be wiped.

I flashed a Wemos D1, a NodeMCU and a Sonoff Basic R3 with serial + Tasmotizer (with erase flash first). they are online now, but one already rebooted after 6hours with an Exception 3. I really don't know what went wrong except the possible FW updates after 9.5.0. I don't think this will get solved as it's hard to pinpoint:-(

Jason2866 commented 2 years ago

Do you have the possibility to setup a second very small wifi network? Just wifi to connect a Tasmota device. Maybe a other device in the network make "something" which makes Tasmota going crazy.