Closed sammyke007 closed 2 years ago
I've got many devices running version 10 without any resetting of Timezone from my standard setting of 99. I'd be surprised if time difference would be a reason for MQTT getting disabled.
It looks like the settings of your device is broken, not just the weird "Timezone":"+104:0"
but especially also the template with very strange component numbers. Not sure what could cause this. Losing MQTT is probably just a result of broken settings structure.
I tried a full reset offcourse before creating an issue. The "Reset 2" + the 2 above backlogs is how I configure the device after reset. Nothing else. After some time, boom, bootcount totally off, timezone wrong, MQTT disabled and spam in the console. MQTT disabled is how I get alerted, because Domoticz can't control it anymore. It happens on all of these 10 devices as long as they are on V10. No issues on 9.5.0. That's the weird thing.
Can you erase the flash from a device and flash v.10. via serial?
You should track this bootcount & possible reboots It may means that your devices are rebooting a lot. This plus strange values could be a problem with the flash chips. If possible, I would revert a few units back to 9.5.0 as a control group
My ESP8266 devices were still on 9.5.0. (only ESP32 were on 10.0.0.x). I just move a few of them to 10.0.0.3 to see that makes any difference
Can you erase the flash from a device and flash v.10. via serial?
Most of them are build behind wall switches or behind a closet. I'll flash a new basic R3 with 10.0.0 and see what it gives. They were all flashed initially with the DiY tool, from there on always OTA upgrades. As said, all of these 10 devices worked rock solid on 9.5.0. MQTT never got disabled by itself.
You should track this bootcount & possible reboots It may means that your devices are rebooting a lot. This plus strange values could be a problem with the flash chips. If possible, I would revert a few units back to 9.5.0 as a control group
My ESP8266 devices were still on 9.5.0. (only ESP32 were on 10.0.0.x). I just move a few of them to 10.0.0.3 to see that makes any difference
The very high boot count was also corrupt above. After a restart it's back normal and it was 31 or so. The device was online for 5 days btw.
And another one of my devices got this crazy corruption. Bootcount crazy, MQTT disabled, console message spam every second and other corrupted settings. I flashed 5 out of 10 devices back to 9.5.0 and they seem to be stable again (except some Exception 9 reboots once in a while, but they reboot and work fine, nothing gets corrupted).
I installed 10.0.0.3 on a test sonoffMiniR2 and I'll check what it gives.
Speculatively, I'm wondering if your issue is somehow related to using Domoticz, which AFAIK is not a very commonly used backend for Tasmota.
Speculatively, I'm wondering if your issue is somehow related to using Domoticz, which AFAIK is not a very commonly used backend for Tasmota.
My testdevice isn't configured with domoticz parameters. I'll see what it gives (also against the exception 9 error they all give once a day).
Your posted config includes Domoticz: DzIdx1 43
I know, but on my testdevice I installed this morning, I left that out
Hi, any news on this?
Have you tried to erase and flash again with Tasmotizer?
Test device is still OK and MQTT still enabled, but just an hour ago, a Wemos D1 with 10.0.0 got MQTT disabled. Timezone 104, bootcount 55k +, ...
how are you flashing these devices? are you using Tasmotizer?
Can you try with latest version 10.0.0.3? ( http://ota.tasmota.com/tasmota/tasmota.bin.gz )
For the easy accessible ones (serial) (Wemos, NodeMCU, ...), I indeed always use the latest Tasmotizer. For the Sonoff's, I use the DIY tool. But both seem to be affected. My test device (MiniR2) is already on 10.0.0.3 and it's been running fine for +- 2 days. I'll update the Wemos that got corrupt too with 10.0.0.3 too, just to make sure. As this happens on one of my 10 (+1 test) devices once every +- 24h, it's not always easy to pinpoint any other exact reason. Only thing that I'm sure off, is that it hasn't been doing this on 9.5.0.
Edit: my test device now runs on 10.0.0.3 instead of 10.0.0. Syslog set to 4 and being logged to my NAS. I hope it corrupts soon :-D
Do you have new devices to test too? Looking at your BootCount there is maybe the flash chip defect. If new devices does work without a problem you have a hardware issue. Have you googled with the Flash Chip Id
from the Information page which manufactorer your flash chip is? There are some with very bad lifetimes.
`
The bootcount is perfectly fine on all devices. It only gets corrupt / very high when it goes crazy. Like I said, the following things occur:
Restarting the module from console fixes it immediately.
I've got Sonoff MiniR2's that are max. 3 months old, but also Wemos D1, NodeMCU and other Sonoff's that are older (1 year or older). They all seem to show this very weird crash / behavior.
Just to confirm that none of the units I moved to 10.0.0.3 earlier this week have shown the problem. Still connected on MQTT and expected Timezone 99
They all seem to show this very weird crash / behavior.
This is super weird. Do you have any magnetic interference close to your devices like antennae or similar?
I know...
I have nothing special in my home. I do run however a ZigBee network of +- 30 devices, but my 2,4 WiFi is on channel 1 and ZigBee is on the highest band. Like I said, never had this behaviour before on previous firmwares. Nothing changed in my house at the time of upgrading to 10.0.0
I'll keep logging as much as I can. I have 4 devices on 9.5.0, 3 on 10.0.0 and 2 + 1 test device on 10.0.0.3
you said:
Weblog spams RSL messages every second
what messages do you have? can you paste here a log of several of those? Can you tried using WEBLOG 4
to catch debug messages too?
and what is the Flash Chip Id
in the Information menu in Tasmota WebUI ?
you said:
Weblog spams RSL messages every second
what messages do you have?
and what is the
Flash Chip Id
in the Information menu in Tasmota WebUI ?
The RSL and STATE message in the first message above. (04:13:27....)
Do you want all 11 IDs? My Wemos D1 that crashed today is Flash Chip Id0x164020
can you paste here a log of several of those? Can you tried using WEBLOG 4 to catch debug messages too?
and what is the Flash Chip Id in the Information menu in Tasmota WebUI ?
Syslog 4 is now active on 3 devices. Waiting for the error / crash to show up...
What is the FLASH CHIP ID
from the Information page on the failing devices?
this value:
Wemos d1 on 10.0.0 that went crazy today:
ESP Chip Id6837606 (ESP8266EX) Flash Chip Id0x164020
Sonoff Basic R3 on 10.0.0 that went crazy 2 days ago:
ESP Chip Id11258249 (ESP8285) Flash Chip Id0x144051
The 0x164020 is a XMC flash chip. In our experience it is a troublemaker and do fail more often than other flash chips very early. The ESP8285 has embedded flash and until now we do not had users with higher fail rates. Have you clicked erase flash in Tasmotizer when you are flashing? Which USB - serial adapter do you use?
No, I don't think I've erased it before. But like I said, everything was "rock solid" before jumping to 10.0.0. I use a USB to TTL CP2102 module to flash the Wemos and NodeMCU devices. Itead DIY tool for the Sonoff devices.
What is definitively strange is that you get the same weird behavior on devices that are totally different which, IMHO, would clear a flash self-corruption. The only common point I seem to see is usage of Domoticz. It could be a problem in Domoticz driver which corrupts settings, however there is no obvious item in the changelog related to Domoticz.
That might be a possible cause. Most of the device crash once in a while with an exception 9. I don't think anything changed related to Domoticz between 9.5.0 and 10.0.0
Mhh, Theo uses Domoticz. If there would be a problem, it would be solved already.
@sammyke007 Can you provide us the minimum we would need to reproduce your setup ? any override for custom build, settings & config ... etc ? (except credentials of course) I don't see any other way.
Am I correct to assume I don't need an actual Domonitcz server to test as everything goes through MQTT ?
@sammyke007 Can you provide us the minimum we would need to reproduce your setup ? any override for custom build, settings & config ... etc ? (except credentials of course) I don't see any other way.
Am I correct to assume I don't need an actual Domonitcz server to test as everything goes through MQTT ?
I don't think you'll need it. Do you have a Sonoff Mini R2 to test on? This is one of my example/crashing devices:
Backlog template {"NAME":"Sonoff MINIR2","GPIO":[17,0,0,0,9,0,0,0,21,56,0,0,0],"FLAG":0,"BASE":1}; module 0
Backlog Timezone 99; Latitude 51.212935; Longitude 4.284476; DzIdx1 385; Switchmode1 8; ipaddress1 192.16.1.221; FriendlyName Tasmota SLK Ouders; DeviceName Tasmota SLK Ouders .221; MqttHost 192.168.1.19; MqttUser *****; MqttPassword *****
I can probably simulate on a nodemcu. Standard Tasmota.bin build ?
Yes, latest 10.0.0.3. My exception 9 errors still occur sometimes on 10.0.0.3, but I THINK that MQTT wasn't disabled anymore on 10.0.0.3. Nothing stopped or crashed today (yet). I'll keep logging. Tnx everyone for all the feedback and followup.
FYI:
It is not related to v.10 you have exceptions with v.9.5 too.
Yeah, the exceptions might not be related to the MQTT disabled error. On 9.5.0 I wasn't watching my devices as much as I'm doing now. MQTT being disabled is something I get alerted by rather fast as my rules are not working anymore or my connection to Domoticz stops. The exceptions were probably there before on 9.5.0 too indeed. MQTT being disabled "by itself" was 100% sure not.
Please provide the mqtt setup of a device which fails. Maybe you have choosen a not valid config.
What do you mean? They all connect to the same broker (latest mosquitto, was 1.5.7 before, I upgraded to exclude the broker). They all use the same username and password.
I only configurate MqttHost, MqttUser and MqttPassword
Your mqtt topic. You can post a screenshot of the info webpage. See example
This is the Wemos D1 that corrupted / MQTT got disabled yesterday. I upgraded it after that from 10.0.0 to 10.0.0.3:
Correct too. Out of ideas.
Yeah... It's not that I'm a complete noob. I checked and tried a lot before posting the issue here. I'll see if it keeps corrupting for now. I'm starting to feel sorry for not finding a stupid mistake made by myself ;-)
It does not look like a easy issue. No reason to excuse.
So I have a nodemcu running under monitoring with your configuration for the last 16 hours. I just added a rule to toggle power every 2 minutes to insure there is some Domoticz activity.
Ok update:
My test Sonoff Mini R2 has been running without a single reboot for 2 days and 22 hours. I now added the command dzidx1 385 to "connect it" to domoticz. I'll see what this gives.
EDIT: already 1 reboot after 10 minutes. Settings preserved however.
One of my devices that really toggles idx 385 in domoticz, constantly logs every Domoticz idx that gets updated with weblog or syslog set to "4". This is normal behavior I guess?
My Nodemcu has been running 36 H now without a problem including you DzIdx 385 Typical 2 minute log:
12:55:18.427 MQT: tele/nodemcu/STATE = {"Time":"2021-11-22T12:55:18","Uptime":"1T16:05:11","UptimeSec":144311,"Heap":26,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"MqttCount":1,"POWER":"OFF","Wifi":{"AP":1,"SSId":"BNETWLAN","BSSId":"88:C3:97:F2:25:EC","Channel":1,"Mode":"11n","RSSI":82,"Signal":-59,"LinkCount":1,"Downtime":"0T00:00:03"}}
12:55:18.453 MQT: tele/nodemcu/SENSOR = {"Time":"2021-11-22T12:55:18","Switch1":"ON"}
12:56:00.435 RUL: TIME#MINUTE|2 performs "Power Toggle"
12:56:00.437 SRC: Rule
12:56:00.440 CMD: Grp 0, Cmnd 'POWER', Idx 1, Len 6, Data 'Toggle'
12:56:00.449 MQT: domoticz/in = {"idx":385,"nvalue":1,"svalue":"","Battery":100,"RSSI":8}
12:56:00.454 MQT: stat/nodemcu/RESULT = {"POWER":"ON"}
12:56:00.459 MQT: stat/nodemcu/POWER = ON
12:56:01.417 CFG: Saved to flash at F7, Count 781, Bytes 4096
12:58:00.431 RUL: TIME#MINUTE|2 performs "Power Toggle"
12:58:00.433 SRC: Rule
12:58:00.435 CMD: Grp 0, Cmnd 'POWER', Idx 1, Len 6, Data 'Toggle'
12:58:00.441 MQT: domoticz/in = {"idx":385,"nvalue":0,"svalue":"","Battery":100,"RSSI":7}
12:58:00.444 MQT: stat/nodemcu/RESULT = {"POWER":"OFF"}
12:58:00.448 MQT: stat/nodemcu/POWER = OFF
I'm starting to get really desperate by now... One device just crashed again. MQTT was disabled, console spammed, ... I typed "restart 1" (not reset, 100% sure) and it lost all config. Back to Sonoff Basic template and WifiManager as everything seemed to be wiped.
I flashed a Wemos D1, a NodeMCU and a Sonoff Basic R3 with serial + Tasmotizer (with erase flash first). they are online now, but one already rebooted after 6hours with an Exception 3. I really don't know what went wrong except the possible FW updates after 9.5.0. I don't think this will get solved as it's hard to pinpoint:-(
Do you have the possibility to setup a second very small wifi network? Just wifi to connect a Tasmota device. Maybe a other device in the network make "something" which makes Tasmota going crazy.
PROBLEM DESCRIPTION
My devices (10 Sonoff Mini, MiniR2 and BasicR3) randomly get their MQTT disabled. The console spams
every second. Timezone99 was my setting, but upon checking, the time is off and timezone gives result +104 instead of 99. As you can see below in the requested information, time says 04:14 instead of 21:14 (UTC+1 in Belgium)...
The following commands work OK, but after some time, the problem acts up again...
REQUESTED INFORMATION
Make sure your have performed every step and checked the applicable boxes before submitting your issue. Thank you!
Backlog Template; Module; GPIO 255
:04:14:38.327 RSL: RESULT = {"NAME":"Sonoff Basic R3","GPIO":[57344,57599,57599,57599,255,0,0,57344,16384,57345,255,40960,0,0],"FLAG":0,"BASE":1} 04:14:38.338 RSL: RESULT = {"Module":{"0":"Sonoff Basic R3"}} 04:14:38.403 RSL: RESULT = {"GPIO0":{"57344":""},"GPIO1":{"57599":""},"GPIO2":{"57599":""},"GPIO3":{"57599":""},"GPIO4":{"255":"Relay32"},"GPIO5":{"0":"None"},"GPIO9":{"0":"None"},"GPIO10":{"57344":""},"GPIO12":{"16384":""},"GPIO13":{"57345":""},"GPIO14":{"255":"Relay32"},"GPIO15":{"40960":""},"GPIO16":{"0":"None"},"GPIO17":{"0":"None"}}
Status 0
:04:17:38.741 RSL: STATUS = {"Status":{"Module":0,"DeviceName":"Tasmota badkamer spiegel .143","FriendlyName":["Tasmota badkamer spiegel"],"Topic":"tasmota_BadkamerSpiegel","ButtonTopic":"0","Power":1,"PowerOnState":255,"LedState":20,"LedMask":"FFFF","SaveData":25344,"SaveState":0,"SwitchTopic":"0","SwitchMode":[5,0,0,0,0,0,0,0],"ButtonRetain":0,"SwitchRetain":1,"SensorRetain":0,"PowerRetain":0,"InfoRetain":0,"StateRetain":0}} 04:17:38.747 RSL: STATUS1 = {"StatusPRM":{"Baudrate":115200,"SerialConfig":"8N1","GroupTopic":"tasmotas","OtaUrl":"ttp://ota.tasmota.com/tasmota/release/tasmota.bin.gz","RestartReason":"Software/System restart","Uptime":"5T02:53:10","StartupUTC":"2021-11-11T17:24:28","Sleep":50,"CfgHolder":18,"BootCount":20992,"BCResetTime":"2020-08-13T23:47:04","SaveCount":8,"SaveAddress":"F7000"}} 04:17:38.752 RSL: STATUS2 = {"StatusFWR":{"Version":"10.0.0(tasmota)","BuildDateTime":"2021-10-19T08:23:49","Boot":31,"Core":"2_7_4_9","SDK":"2.2.2-dev(38a443e)","CpuFrequency":80,"Hardware":"ESP8285","CR":"410/699"}} 04:17:38.757 RSL: STATUS3 = {"StatusLOG":{"SerialLog":2,"WebLog":2,"MqttLog":0,"SysLog":0,"LogHost":"","LogPort":514,"SSId":["Wifi13",""],"TelePeriod":1,"Resolution":"558180C0","SetOption":["01000080","05C8000100060000005A0A0000000000003C","05000000","00006000","00004000"]}} 04:17:38.766 RSL: STATUS4 = {"StatusMEM":{"ProgramSize":612,"Free":388,"Heap":16,"ProgramFlashSize":1024,"FlashSize":1024,"FlashChipId":"144051","FlashFrequency":40,"FlashMode":3,"Features":["00000809","8FDAC787","04368001","000000CF","010013C0","C000F981","00004004","00001000","00000020"],"Drivers":"1,2,3,4,5,6,7,8,9,10,12,16,18,19,20,21,22,24,26,27,29,30,35,37,45","Sensors":"1,2,3,4,5,6"}} 04:17:38.772 RSL: STATUS5 = {"StatusNET":{"Hostname":"tasmota-BadkamerSpiegel-2441","IPAddress":"192.168.1.143","Gateway":"192.168.1.1","Subnetmask":"255.255.255.0","DNSServer1":"192.168.1.1","DNSServer2":"0.0.0.0","Mac":"DC:4F:22:AB:C9:89","Webserver":2,"HTTP_API":1,"WifiConfig":4,"WifiPower":17.0}} 04:17:38.777 RSL: STATUS7 = {"StatusTIM":{"UTC":"2021-11-16T20:17:38","Local":"2021-11-21T04:17:38","StartDST":"2021-01-30T14:00:00","EndDST":"2021-01-23T06:00:00","Timezone":"+104:0","Sunrise":"15:09","Sunset":"23:46"}} 04:17:38.782 RSL: STATUS10 = {"StatusSNS":{"Time":"2021-11-21T04:17:38","Switch1":"ON"}} 04:17:38.787 RSL: STATUS11 = {"StatusSTS":{"Time":"2021-11-21T04:17:38","Uptime":"5T02:53:10","UptimeSec":442390,"Heap":16,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"MqttCount":8,"POWER":"ON","Wifi":{"AP":1,"SSId":"Wifi13","BSSId":"04:D4:C4:B4:A2:98","Channel":1,"Mode":"11n","RSSI":58,"Signal":-71,"LinkCount":6,"Downtime":"0T00:01:40"}}}
TO REPRODUCE
Happens randomly after half a day, 2 days, ...
EXPECTED BEHAVIOUR
Keep MQTT enabled and keep correct timezone. (I guess time difference gets the MQTT disabled?)
SCREENSHOTS
/
ADDITIONAL CONTEXT
These 10 devices have been running rock solid on 9.5.0 and before. Never had this problem before and it started giving this problem right after updating to v10.0.0. I already tried a complete "Reset 2" and configuring my device as new, but after some days it start's again (MQTT suddenly disabled, timezone off and console message spam every second).
(Please, remember to close the issue when the problem has been addressed)