arendst / Tasmota

Alternative firmware for ESP8266 and ESP32 based devices with easy configuration using webUI, OTA updates, automation using timers or rules, expandability and entirely local control over MQTT, HTTP, Serial or KNX. Full documentation at
https://tasmota.github.io/docs
GNU General Public License v3.0
22.2k stars 4.81k forks source link

Too many writes to flash #772

Closed mkh595 closed 6 years ago

mkh595 commented 7 years ago

I had two cases where tens of thousands writes to flash occured.

Case 1: There was broken WiFi on router, only LAN on cable worked. WifiConfig = 4 so there is no reboot but what important has to be writen to flash if it is trying to reconnect and nothing was changed?

pow3

Case2: After change of router with same SSID/Password is one POW permanently rebooting (Exception 0)

pow2

This is log which appears in console and it is rebooted again

pow5

arendst commented 7 years ago

Both cases are solved with latest releases. #752

mkh595 commented 7 years ago

Thank you for information. I set High tresholds to 0 as temporary solution until new version is uploaded. Now is not rebooting.

mkh595 commented 7 years ago

I upgraded to 5.6.1c but Exception 0 still remain. Now is not permanently rebooted but it looks that it is related to WiFi connection. I restarted router due to new settings and POW was rebooted due to Exception 0.

pow6

arendst commented 7 years ago

Pls try version 5.6.1d containing more divide by zero tests to solve Exception 0 and let me know the results.

mkh595 commented 7 years ago

Tested with version 5.6.1f There is no Exception 0 but there is still exception in specific case. I am using router and two repeaters, all of them with same SSID/password. I restarted one repeater so unit should connect to another one or to the router (signal strench is enough). But there is Exception 28 and reboot.

pow13

pow14

This is probabbly reason why I had so many reboots because router was randomly switching WiFi On and Off before completly died.

arendst commented 7 years ago

Decoded exception:

Exception 28: LoadProhibited: A load referenced a page mapped with an attribute that does not permit loads
Decoding 5 results
0x4020d6b8: switch_handler() at R:\Arduino\Work-ESP8266\Theo\sonoff\Sonoff-Tasmota\sonoff-5\Development\sonoff/sonoff.ino line 2198
0x4020912f: i2c_read(unsigned char, unsigned char, unsigned char) at R:\Arduino\Work-ESP8266\Theo\sonoff\Sonoff-Tasmota\sonoff-5\Development\sonoff/support.ino line 678
0x40209137: i2c_read(unsigned char, unsigned char, unsigned char) at R:\Arduino\Work-ESP8266\Theo\sonoff\Sonoff-Tasmota\sonoff-5\Development\sonoff/support.ino line 678
0x402090d2: i2c_read(unsigned char, unsigned char, unsigned char) at R:\Arduino\Work-ESP8266\Theo\sonoff\Sonoff-Tasmota\sonoff-5\Development\sonoff/support.ino line 678
0x4022aaee: pm_post at ?? line ?

You do not happen to have connected I2C to your Pow?

Pls provide output of command status 0.

Does it only happen on Pow or also on other types of sonoff?

mkh595 commented 7 years ago

18:09:48 CMND: cmnd/Status 0 18:09:48 MQTT: home/stat/home214/STATUS = {"Status":{"Module":6, "FriendlyName":"UV lampa", "Topic":"home214", "ButtonTopic":"0", "Power":1, "PowerOnState":3, "LedState":7, "SaveData":1, "SaveState":1, "ButtonRetain":0, "PowerRetain":0}} 18:09:48 MQTT: home/stat/home214/STATUS1 = {"StatusPRM":{"Baudrate":115200, "GroupTopic":"powxxx", "OtaUrl":"http://192.168.1.15:80/tasmota/powxxx.bin", "Uptime":9, "Sleep":0, "BootCount":40, "SaveCount":126, "SaveAddress":"F5000"}} 18:09:48 MQTT: home/stat/home214/STATUS2 = {"StatusFWR":{"Program":"5.6.1f", "BuildDateTime":"2017-08-28T08:25:23", "Boot":31, "Core":"2_3_0", "SDK":"1.5.3(aec24ac9)"}} 18:09:48 MQTT: home/stat/home214/STATUS3 = {"StatusLOG":{"Seriallog":2, "Weblog":2, "Syslog":0, "LogHost":"192.168.1.15", "SSId1":"mkh", "SSId2":"mkh-samsung", "TelePeriod":10, "Option":"55800009"}} 18:09:48 MQTT: home/stat/home214/STATUS4 = {"StatusMEM":{"ProgramSize":466, "Free":536, "Heap":25, "ProgramFlashSize":1024, "FlashSize":4096, "FlashMode":3}} 18:09:48 MQTT: home/stat/home214/STATUS5 = {"StatusNET":{"Hostname":"home214", "IPaddress":"192.168.1.214", "Gateway":"192.168.1.1", "Subnetmask":"255.255.255.0", "DNSServer":"192.168.25.1", "Mac":"2C:3A:E8:07:59:E9", "Webserver":2, "WifiConfig":2}} 18:09:48 MQTT: home/stat/home214/STATUS6 = {"StatusMQT":{"Host":"192.168.1.15", "Port":1883, "ClientMask":"ESP_%06X", "Client":"ESP_0759E9", "User":"", "MAX_PACKET_SIZE":512, "KEEPALIVE":15}} 18:09:48 MQTT: home/stat/home214/STATUS7 = {"StatusTIM":{"UTC":"Mon Aug 28 16:09:48 2017", "Local":"Mon Aug 28 18:09:48 2017", "StartDST":"Sun Mar 26 02:00:00 2017", "EndDST":"Sun Oct 29 03:00:00 2017", "Timezone":99}} 18:09:48 MQTT: home/stat/home214/STATUS8 = {"StatusPWR":{"Total":27.840, "Yesterday":0.935, "Today":1.448, "Power":81, "Factor":0.99, "Voltage":221, "Current":0.371}} 18:09:48 MQTT: home/stat/home214/STATUS9 = {"StatusPTH":{"PowerLow":0, "PowerHigh":0, "VoltageLow":0, "VoltageHigh":0, "CurrentLow":0, "CurrentHigh":0}} 18:09:48 MQTT: home/stat/home214/STATUS10 = {"StatusSNS":{"Time":"2017-08-28T18:09:48"}} 18:09:48 MQTT: home/stat/home214/STATUS11 = {"StatusSTS":{"Time":"2017-08-28T18:09:48", "Uptime":9, "Vcc":3.230, "POWER":"ON", "Wifi":{"AP":1, "SSID":"mkh", "RSSI":68, "APMac":"84:16:F9:F3:D7:DC"}}}

It happens on POW and TH16, without any external sensor.

mkh595 commented 7 years ago

How can I add formated output like in console? 'Insert code' doesn't work.

arendst commented 7 years ago

Use Three "`" characters (the one on the left top of your keyboard just below Esc) before and after your formatted output

arendst commented 7 years ago

I see your program size is 466k as the "normal" released version is 479k.

To analyze exceptions the tool needs to have the exact source configuration otherwise you'll get the result as I replied above.

Next quick steps would be:

jetema commented 7 years ago

I have the same problem, still with version 5.7.0, too many writes to flash, when there's no wifi available ( every night I switch off my wifi, from 23h to 9h aprox. ) I realized there are about 4200 writes to flash in 10 hours, my sonoff settings are "SetOption1 OFF" and "WifiConfig 4", So please any idea to solve this ?

1 2 3

arendst commented 7 years ago

As you turn off your router sonoff tries to find the other configured router and every switch of router will lead to a flash write.

Solution is change your sonoffs savedata time to a larger value when you turn off your router and change it back when you turn your router on again.

Why do you want to turn off your router anyway?

jetema commented 7 years ago

Hello Theo, many thanks for your quick reply, and as well CONGRATULATIONS for your REAL GOOD JOB with TASMOTA for sonoff. So I have a solar installation and I switch off every night my router to save energy and avoid discharge of batteries ( my router power consumption is about 6W x 10 hours = 60W/h, not very high amount I know )

I will try "savedata time" or I will switch off the sonoff by night. Anyway as a suggestion to improve future versions, I think it would be a good idea avoid writting to flash if there's no wifi available, why is it necessary if sonoff has no stable connection ?

amita1974 commented 3 years ago

@arendst I am using Tasmota for Sonoff Basic Module, Program Version: 9.3.1(tasmota), Build Date & Time: 2021-03-09T16:12:28, Core/SDK Version: 2_7_4_9/2.2.2-dev(38a443e)

My flash write properties are: Flash write Count: 32362 at 0xF7000 Boot Count: 31044 The last restart reason: Software/System restart

I don't understand why there are so many flash writes, if I can limit them to less wrtes and if so, how will this affect the functionality of the device.

I checked as you suggested what is my SaveData value and it returns stat/tasmota_3770A1/RESULT = {"SaveData":"ON"} - so I do not know what to change the value to and what is the current value that is used by my device.

Please see below the content of the console log (If you can explain how I can see the older data from the log I will be happy to learn that as well). in the attached log you can see that the wifi connection is not fully stable. the RSSI reported by the device is (18%, -91 dBm), and sometimes (16%, -92 dBm) and (14%, -93 dBm). this value is when I added a range extender - it is not easy to me to improve the WiFi strength at this location.

{ Update: I know that my RPI (192.168.1.58) that runs Mosquitto and Domoticz is not functioning well and does not respond well even to ping requests - I need to re-flush it in order to try to make it work again - plan to move to another IOT hab - probably home assistant or maybe OpenHab. In the meantime I use the Tasmitized sonoff using its web interface and using internal timers functionality. }

I will be happy to get answers and hear your thoughts.

14:11:22.267 UPP: Multicast (re)joined 14:11:52.272 MQT: Attempting connection... 14:11:52.483 MQT: Connect failed to 192.168.1.58:1883, rc -2. Retry in 40 sec 14:11:53.240 UPP: Multicast (re)joined 14:12:33.259 MQT: Attempting connection... 14:12:33.470 MQT: Connect failed to 192.168.1.58:1883, rc -2. Retry in 50 sec 14:12:34.276 UPP: Multicast (re)joined 14:13:24.267 MQT: Attempting connection... 14:13:24.478 MQT: Connect failed to 192.168.1.58:1883, rc -2. Retry in 60 sec 14:13:25.284 UPP: Multicast (re)joined 14:14:25.277 MQT: Attempting connection... 14:14:27.681 MQT: Connected 14:14:27.686 MQT: tele/tasmota_3770A1/LWT = Online (retained) 14:14:27.689 MQT: cmnd/tasmota_3770A1/POWER = 14:14:28.505 UPP: Multicast (re)joined 14:15:46.884 MQT: tele/tasmota_3770A1/STATE = {"Time":"2021-05-29T14:15:46","Uptime":"1T02:57:46","UptimeSec":97066,"Heap":27,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"MqttCount":61,"POWER":"OFF","Wifi":{"AP":1,"SSId":"aa_wnetwork_2GExt","BSSId":"D0:0E:D9:13:C8:EB","Channel":11,"RSSI":12,"Signal":-94,"LinkCount":2,"Downtime":"0T00:00:09"}} 14:20:46.873 MQT: tele/tasmota_3770A1/STATE = {"Time":"2021-05-29T14:20:46","Uptime":"1T03:02:46","UptimeSec":97366,"Heap":27,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"MqttCount":61,"POWER":"OFF","Wifi":{"AP":1,"SSId":"aa_wnetwork_2GExt","BSSId":"D0:0E:D9:13:C8:EB","Channel":11,"RSSI":12,"Signal":-94,"LinkCount":2,"Downtime":"0T00:00:09"}} 14:25:46.882 MQT: tele/tasmota_3770A1/STATE = {"Time":"2021-05-29T14:25:46","Uptime":"1T03:07:46","UptimeSec":97666,"Heap":25,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"MqttCount":61,"POWER":"OFF","Wifi":{"AP":1,"SSId":"aa_wnetwork_2GExt","BSSId":"D0:0E:D9:13:C8:EB","Channel":11,"RSSI":16,"Signal":-92,"LinkCount":2,"Downtime":"0T00:00:09"}} 14:30:46.869 MQT: tele/tasmota_3770A1/STATE = {"Time":"2021-05-29T14:30:46","Uptime":"1T03:12:46","UptimeSec":97966,"Heap":27,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"MqttCount":61,"POWER":"OFF","Wifi":{"AP":1,"SSId":"aa_wnetwork_2GExt","BSSId":"D0:0E:D9:13:C8:EB","Channel":11,"RSSI":16,"Signal":-92,"LinkCount":2,"Downtime":"0T00:00:09"}} 14:35:46.876 MQT: tele/tasmota_3770A1/STATE = {"Time":"2021-05-29T14:35:46","Uptime":"1T03:17:46","UptimeSec":98266,"Heap":27,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"MqttCount":61,"POWER":"OFF","Wifi":{"AP":1,"SSId":"aa_wnetwork_2GExt","BSSId":"D0:0E:D9:13:C8:EB","Channel":11,"RSSI":12,"Signal":-94,"LinkCount":2,"Downtime":"0T00:00:09"}} 14:40:46.891 MQT: tele/tasmota_3770A1/STATE = {"Time":"2021-05-29T14:40:46","Uptime":"1T03:22:46","UptimeSec":98566,"Heap":27,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"MqttCount":61,"POWER":"OFF","Wifi":{"AP":1,"SSId":"aa_wnetwork_2GExt","BSSId":"D0:0E:D9:13:C8:EB","Channel":11,"RSSI":12,"Signal":-94,"LinkCount":2,"Downtime":"0T00:00:09"}} 14:42:29.494 MQT: Attempting connection... 14:42:29.705 MQT: Connect failed to 192.168.1.58:1883, rc -2. Retry in 10 sec 14:42:30.460 UPP: Multicast (re)joined 14:42:40.472 MQT: Attempting connection... 14:42:41.183 MQT: Connected 14:42:41.189 MQT: tele/tasmota_3770A1/LWT = Online (retained) 14:42:41.193 MQT: cmnd/tasmota_3770A1/POWER = 14:42:41.958 UPP: Multicast (re)joined 14:45:46.906 MQT: tele/tasmota_3770A1/STATE = {"Time":"2021-05-29T14:45:46","Uptime":"1T03:27:46","UptimeSec":98866,"Heap":27,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"MqttCount":62,"POWER":"OFF","Wifi":{"AP":1,"SSId":"aa_wnetwork_2GExt","BSSId":"D0:0E:D9:13:C8:EB","Channel":11,"RSSI":16,"Signal":-92,"LinkCount":2,"Downtime":"0T00:00:09"}} 14:50:46.910 MQT: tele/tasmota_3770A1/STATE = {"Time":"2021-05-29T14:50:46","Uptime":"1T03:32:46","UptimeSec":99166,"Heap":27,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"MqttCount":62,"POWER":"OFF","Wifi":{"AP":1,"SSId":"aa_wnetwork_2GExt","BSSId":"D0:0E:D9:13:C8:EB","Channel":11,"RSSI":18,"Signal":-91,"LinkCount":2,"Downtime":"0T00:00:09"}} 14:52:35.395 CMD: savedata 14:52:35.406 MQT: stat/tasmota_3770A1/RESULT = {"SaveData":"ON"} 14:55:46.865 MQT: tele/tasmota_3770A1/STATE = {"Time":"2021-05-29T14:55:46","Uptime":"1T03:37:46","UptimeSec":99466,"Heap":25,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":24,"MqttCount":62,"POWER":"OFF","Wifi":{"AP":1,"SSId":"aa_wnetwork_2GExt","BSSId":"D0:0E:D9:13:C8:EB","Channel":11,"RSSI":16,"Signal":-92,"LinkCount":2,"Downtime":"0T00:00:09"}}

arendst commented 3 years ago

Every boot will write at least it's bootcount value to flash. From your bootcount value of 31044 you may conclude you have way too many reboots.

To see more logging could setup a syslog service on your RPI and enable tasmota syslogging with command syslog 3. This may reveal why you have so many reboots.

amita1974 commented 3 years ago

Hi Arendst, Thanks for your feedback.

Per your advice, I enabled syslog server first thing after getting your reply and followed the boot and flash write counters. It seems that they were not rolling so fast. But in the last 10 days the were increased very much.

I attach the syslog file and would like to get your insigts about it and ideas on how to proceed with debugging this issue.

My insigts:

  1. Maybe the wireless connection was not so good - since when grepping the log for "flash" I see that the difference between the flash writes is large (several hundreds writes) between one write and the other - meaning the writes in between were probably not logged - maybe due to a connection (with the RPI that runs the syslog server) issue - or due to some other reason. Or the problem is with the counters themselves (I would not bet on that, but still an option to consider...). In the attached log see the difference between the count value of Jul- and Jul-15 Line 9: Jun 4 12:47:43 tasmota_3770A1-4257 ESP-CFG: Saved to flash at FA, Count 32391, Bytes 4096 Line 11405: Jun 4 20:49:34 tasmota_3770A1-4257 ESP-CFG: Saved to flash at F7, Count 32394, Bytes 4096 Line 127277: Jul 3 17:46:51 tasmota_3770A1-4257 ESP-CFG: Saved to flash at F4, Count 32741, Bytes 4096 Line 132656: Jul 3 21:35:58 tasmota_3770A1-4257 ESP-CFG: Saved to flash at FB, Count 32742, Bytes 4096 Line 134530: Jul 3 22:54:26 tasmota_3770A1-4257 ESP-CFG: Saved to flash at FA, Count 32743, Bytes 4096 Line 134779: Jul 3 23:03:54 tasmota_3770A1-4257 ESP-CFG: Saved to flash at F8, Count 32745, Bytes 4096 Line 140791: Jul 5 13:39:30 tasmota_3770A1-4257 ESP-CFG: Saved to flash at F5, Count 32748, Bytes 4096 Line 143397: Jul 7 19:36:32 tasmota_3770A1-4257 ESP-CFG: Saved to flash at FA, Count 32751, Bytes 4096 Line 145068: Jul 15 06:20:20 tasmota_3770A1-4257 ESP-CFG: Saved to flash at F9, Count 33512, Bytes 4096 Line 145074: Jul 15 06:20:21 tasmota_3770A1-4257 ESP-CFG: Saved to flash at F8, Count 33513, Bytes 4096 Line 146372: Jul 15 14:45:39 tasmota_3770A1-4257 ESP-CFG: Saved to flash at F4, Count 33517, Bytes 4096 Line 146384: Jul 15 14:45:42 tasmota_3770A1-4257 ESP-CFG: Saved to flash at FB, Count 33518, Bytes 4096

  2. There is a large gap in the log records between Jul-8 and Jul-15 - it could also be the fault of the RPI. Please note that on Jul-15 we had 3 power failures that caused restart both to the PI and the Tasmotta - and may have solved the syslog reporting issue by that. The Tasmotta worked in these days - I know that at least some of the time it turned on and off the SONOFF's relay sue to the configuration of the Tasmotta timers.

Thanks, Amit. tasmotaPoolPump.zip

Edit: adding more information: Attaching also the syslog file, 20210717tasmotaPoolPump.log, which contains the data between the last syslog and the time near the new reading from 17.7.2021. during this time you can see that the flash write counter was increased as well as the boot counter. During this time there was no power failure. Information from the web interface of tasmotta: @ 17.7.2021 @ 20:03 Uptime 0T06:09:26 Flash write Count 33532 at 0xF5000 Boot Count 31978 The uptime is only 6 hours but there was no power failure during the time from the last read on 16.7.2021.

@ 16.7.2021 - not sure the exact hour I took this record. Uptime 0T20:59:51 Flash write Count 33525 at 0xF4000 Boot Count 31974

============

20210717tasmotaPoolPump.log

amita1974 commented 2 years ago

Hi Theo @arendst, You did not answer my question above. I would like to ask something else as a suggestion to solution, since these many reboots destroy my flash memory with so many writes: Assuming that the resets occur due to unstable/weak WIFI signal, what is the recommened WifiConfig setting to set, that will not cause me to be "locked outside my Sonoff Tasmotta based device" in case that I will need to switch AP or for any reason I will not be able to connect to the device and will want to reconfigure it? Currently my setting for WifiConfig is 2 - WifiManager:

03:16:49.383 CMD: WifiConfig
03:16:49.392 RSL: RESULT = {"WifiConfig":{"2":"WifiManager"}}

Do you think that changing this setting will prevent the increase in the reboots and undesired flash writes? To which value?

My current Value, after upgrading to 10.1.0 is:

Flash write Count   40247 at 0xF9000
Boot Count  38636

My full device info is

Sonoff Basic
Tasmota
Program Version 10.1.0(tasmota)
Build Date & Time   2021-12-08T14:47:33
Core/SDK Version    2_7_4_9/2.2.2-dev(38a443e)
Uptime  0T00:40:10
Flash write Count   40247 at 0xF9000
Boot Count  38636
Restart Reason  Software/System restart
Friendly Name 1 Pool Pump

AP1 SSId (RSSI) aa_wnetwork_2GExt (8%, -96 dBm) 11n
Hostname    tasmota-3770A1-4257
MAC Address DC:4F:22******
IP Address (wifi)   192.168.1.70
Gateway 192.168.1.1
Subnet Mask 255.255.255.0
DNS Server1 192.168.1.1
DNS Server2 0.0.0.0

HTTP API    Enabled

MQTT Host   192.168.1.58
MQTT Port   1883
MQTT User   aa_s******
MQTT Client DVES_3770A1
MQTT Topic  tasmota_%06X
MQTT Group Topic 1  cmnd/tasmotas/
MQTT Full Topic cmnd/tasmota_3770A1/
MQTT Fallback Topic cmnd/DVES_3770A1_fb/
MQTT No Retain  Disabled

Emulation   Hue Bridge

ESP Chip Id 3633313 (ESP8266EX)
Flash Chip Id   0x14405E
Flash Size  1024 kB
Program Flash Size  1024 kB
Program Size    616 kB
Free Program Space  384 kB
Free Memory 25.0 kB

Thanks, Amit.

P.S. You can still look at my previous reply. it included the syslog log file that you requested and may help you see the cause for the resets / re-write to flash problem that I have.

ascillato commented 2 years ago

The Default is the recommended one: Wificonfig 4 (that is retry connection without restart). That won't lock you out, since pushing the button 6 times will make the AP to show up. More info at https://tasmota.github.io/docs/Buttons-and-Switches/#multi-press-functions

amita1974 commented 2 years ago

@ascillato Thanks for the fast response. In case that the button is not accessible, will there be any other way? In order to get to the button I have to open up an Air condition unit that is located outside my house - into which I added the Sonnoff to. This is not so easy to do so I will use this option only if there will be no other way.

BTW, is the 8% signal strnght explaining the many boots and high flash write count?

Thanks, Amit.

ascillato commented 2 years ago

In case that the button is not accessible, will there be any other way? In order to get to the button I have to open up an Air condition unit that is located outside my house - into which I added the Sonnoff to.

If for example your wifi goes down, you don't have anything physical to interact with the device, with wificonfig 4, your sonoff basic will be looking for your wifi. In that case, just make a hotspot with your phone with the credentials of your wifi and it will connect to your phone and you can reconfigure it without issues.

BTW, is the 8% signal strnght explaining the many boots and high flash write count?

RSSI Below 50% is not recommended at all. You will have a very poor performance and a lot of disconnections. You should improve your wifi coverage or move the device closer to your Router.

amita1974 commented 2 years ago

RSSI Below 50% is not recommended at all. You will have a very poor performance and a lot of disconnections. You should improve your wifi coverage or move the device closer to your Router. Improving my WiFi signal at this specific location is challenging.

I don't need performance. I need reliability, meaning if the signal is not strong and a command is lost, it should be retransmitted until it will get done. Can this be achieved? What configuration should I select to assure that?

In addition, is this low signal strength the reason for me to get so many resets and re-writes to the flash?

ascillato commented 2 years ago

There is no reliability with that extremely low signal.

The WiFi protocol doesn't have that type of retransmission. That should be done at your home automation software.

If you want that type of reliability ( that when a device is not connected and to make the router to transmit commands when the device reconnects ) you should move to ZigBee. ZigBee does that. There is the sonoff mini ZigBee and you can control it from the sonoff ZigBee bridge. The ZigBee bridge can be Tasmotized.

About your boot count, it is hard to tell without any information (status 0, etc etc). If your wificonfig is 0, it will reboot everytime it disconnects. Besides that, if your device can't sync with NTP, it will reboot too ( because of a bug in the Arduino core )

On top of all that, having devices in your wifi network with that poor signal, introduces another issue that is called the hidden transmitter and that lowers a lot your wifi speed and reliability for ALL your devices. So having a device like that will give you more problems than solutions. You should ensure that all devices have at least 50% RSSI.

FYI, the hidden transmitter is that the router can see for example your phone and the device with 8% signal, but between them they can not. So these 2 devices will try to speak to the router at the same time. The router see the crosstalk and send a disconnect signal to All devices to resync the beacon and make all devices to reconnect and sync again to transmit when it is their assignment time.

amita1974 commented 2 years ago

@ascillato, I think that you have a mistake, as the WiFi is a medium for transferring the data, but for reliability this should be solved in higher layers e.g. TCP/IP, and also, in our case, using the MQTT settings. The MQTT has several configuration levels of reliability settings, and one of them should achieve the reliability of the data transfer, to the best of my knowledge.

See "mqtt connection reliability":

Reliability of MQTT
MQTT can allow for messages to be stored at the broker until a device is ready to receive it.
Thanks to QoS (Quality of Service), MQTT has the ability to queue messages,
make sure they get where they are going and if required, ensure that they only get there once.

So my Question is how should I set the MQTT QOS setting to ensure commands transfer to the device.

And also, what happens if the device is programmed to turn its state on/off using the web server at specific hours, but at the exact moment of the change request the device was in the process of reconnecting to the WiFi after reset? - in such case - will it skip the state update since it is not aware to the current time and may only understand the time one minute after the requested change state?

Regarding the boot counter, what data is missing? I attached syslog server logs to on of my previous comments, as @arendst requested. Is there any other data that you want me to send? And if so, how can I extract that data?

Thanks.

barbudor commented 2 years ago

Tasmota is using the PubSubClient library from Nick O'Leary which only supports QoS 0. This is normally enough on a local network with proper coverage.

As you said, TCP performs retransmission of packets. However every recovery mecanism as limit. If the quality of transmission network is bad or if disconnection happens, then the TCP connection may be dropped and the retransmission windows cleared.

amita1974 commented 2 years ago

@barbudor Thanks for the answer. As for the TCP limitations you are correct and I am aware of them. I was hoping that the QOS of MQTT layer could solve this, but as you write this is not supported now.

Pending for comments on the ability of the internal on/of timers (configured from the web server) to function while the device is not connected to the network. I changed the WiFi Config value from 2 to 4 hoping that this will reduce the number of resets, and as a result the number of flash writes.

arendst commented 2 years ago

To lower the amount of exceptions (and inherit restarts) due to bad MQTT reception upgrade to the latest development version v2022.01.1 and up

lalo-uy commented 2 years ago

I agree that 8% is way too low. I have a device with 20% and it disconnects from time to time. On mqtt you can use the retain flag, so when the device reconnect withthe brooker, it get the pending messages.

El El lun, 10 de ene. de 2022 a la(s) 07:37, Theo Arends < @.***> escribió:

To lower the amount of exceptions (and inherit restarts) due to bad MQTT reception upgrade to the latest development version v2022.01.1 and up

— Reply to this email directly, view it on GitHub https://github.com/arendst/Tasmota/issues/772#issuecomment-1008739970, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACXBW4L4R6OIQNN3ZRCG7N3UVKZF7ANCNFSM4DX6PKQQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

amita1974 commented 2 years ago

@lalo-uy How do you set the retain flag on the MQTT for Tasmotta?

amita1974 commented 2 years ago

@arendst Thanks. I will try that, but before doing so I will give it some time to see if the latest FW upgrade from 9.4.0 to 10.1.0 that I did today, together with the WiFiMode config change from 2 to 4 helped. so far, during ~10hours there was no reset. I don't want to add too many variables at the same time to the equation.

If I will see that I still get more resets, I will upgrade and update you.

lalo-uy commented 2 years ago

You have to set it on the systen originating the message, tha I assume is not Tasmota.

El El lun, 10 de ene. de 2022 a la(s) 11:11, Amit Alon < @.***> escribió:

@lalo-uy https://github.com/lalo-uy How do you set the retain flag on the MQTT for Tasmota?

— Reply to this email directly, view it on GitHub https://github.com/arendst/Tasmota/issues/772#issuecomment-1008909674, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACXBW4IE7FRVUGHVKJD4FOTUVLSJPANCNFSM4DX6PKQQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

amita1974 commented 2 years ago

Hello @arendst Even after WiFiMode config was change from 2 to 4 I still get many resets. I tried to upgrade the FW with the latest development version but I get "Not enough space" message:

Sonoff Basic
Tasmota
Upload Failed
Not enough space

The process that I did is download the minimal version http://ota.tasmota.com/tasmota/tasmota-minimal.bin.gz from here: http://ota.tasmota.com/tasmota/, extracting it and selecting the file tasmota-minimal.bin from the web server upgrade page of Tasmota.

I planned to go on with updating to the latest version tasmota.bin but since the minimum version step was not working, I could not continue. current version that I have: Program Version 10.1.0(tasmota) Build Date & Time 2021-12-08T14:47:33 Core/SDK Version 2_7_4_9/2.2.2-dev(38a443e)

Please advice. Thanks, Amit.

barbudor commented 2 years ago

You are doing all wrong

GZ are not intended to be unzipped but uploaded directly. Tasmota will unzip the file directly. That save flash space and allows larger binaries to be flashed Beside you shouldn't download the file and upload it to Tasmota On the upgrade page, in the "OTA Url" section, directly the HTTP URL you want to flash such as http://ota.tasmota.com/tasmota/tasmota.bin.gz

Tasmota will automatically download the file. If needed it will download tasmota-minimal.bin.gz also

amita1974 commented 2 years ago

Thanks, @barbudor Updated with the latest dev FW now. I will report if resets are now ceasing to accumulate.

Current version and status:
Program Version 2022.01.3(tasmota)
Build Date & Time   2022-01-16T15:37:56
Core/SDK Version    2_7_4_9/2.2.2-dev(38a443e)
Uptime  0T00:00:18
Flash write Count   40555 at 0xFB000
Boot Count  38936
amita1974 commented 2 years ago

@arendst, ~<2 days after, the updated version from the development branch shows improvement in the number of resets: 2 resets in ~<2 days. But maybe it is too early to get to conclusions. I will keep following up anג update.

Program Version 2022.01.3(tasmota)
Build Date & Time   2022-01-16T15:37:56
Core/SDK Version    2_7_4_9/2.2.2-dev(38a443e)
Uptime  1T11:16:02
Flash write Count   40557 at 0xF9000
Boot Count  38938
Restart Reason  Software/System restart

If this FW version will really shows a big change in the number of resets, than this is a very important update to keep our device's flash from getting destroyed.

amita1974 commented 2 years ago

@arendst, Even with the latest FW updated from dev branch, 9 hours after the previous post - got another 5 flash writes due to 5 more boot cycles.

Program Version 2022.01.3(tasmota)
Build Date & Time   2022-01-16T15:37:56
Core/SDK Version    2_7_4_9/2.2.2-dev(38a443e)
Uptime  0T01:05:38
Flash write Count   40562 at 0xF4000
Boot Count  38943
Restart Reason  Software/System restart

Please advise. the SONOFF will stop functioning if this will continue further as it's flash is going to get to its end of life. I attached syslog logs in one of my previous reports., per your request. did you look at them?

Attaching another log file now tasmotaPoolPump.log.bak_2000.zip, from the last few days. note that in case of network disconnection log messages may be lost since the syslog server at 192.168.1.58 is not accessible at this time.

amita1974 commented 2 years ago

18 hours later => +5 flash writes due to 5 resets

Program Version 2022.01.3(tasmota)
Build Date & Time   2022-01-16T15:37:56
Core/SDK Version    2_7_4_9/2.2.2-dev(38a443e)
Uptime  0T10:06:13
Flash write Count   40562 at 0xF4000
Boot Count  38943
Restart Reason  Software/System restart
amita1974 commented 2 years ago

Latest syslog part after the previous that I attached yesterday is attached as well now. 192.168.1.58 is the IP of the RPI with the syslog server. In the new log file you can't see the word "flash" that appeared in the old log before I did the FW upgrade. tasmotaPoolPump.log

arendst commented 2 years ago

What are you trying to proof?

Your wifi signal is way too low (-95) for a stable connection as was said earlier. In your last log you see many connection errors for both multicast and MQTT. Gladly Tasmota stopped restarting on these errors so your worry about flash writes on restarts is solved.

Your solution is moving the pool pump to the router or moving the router to the pool pump (or be wise and install another router in between).

amita1974 commented 2 years ago

Hi Theo @arendst ,

I am not trying to proof anything. The situation that it is difficult for me to improve the reception signal near the location that the Sonoff device is set: The current signal is after I added a WiFi repeater that is located only 4.5 meters from the Sonoff, but since the Sonoff is located inside a box (not made from metal) that is used for connecting the Sonoff the power and protect it from the rain, and there is a glass window and a wooden door in between the repeater and the Sonoff, there is reduction in the WiFi signal and the current situation is the best setup that I could have.

I am raising an issue and want to see if there is a possible solution.

I thought that Tasmota will stop restarting on these errors, but I see that it still does resets - during the last 18 hours it did reset 5 times. This is a lot. You wrote that "Gladly Tasmota stopped restarting on these errors so your worry about flash writes on restarts is solved", but the flash writes and resets counters shows otherwise - so I am not sure what you meant - or maybe you missed the counters values that I put in my comments.

To be clear - I am not complaining. I just want to see if we can find some solution that will prevent the resets when the WiFi signal is not so good. I am willing to live with the disconnections but want to avoid the resets that cause flash writes. Alternatively, maybe we can have the resets but prevent the flash writes (?), since having so many flash writes is the actual problem...

Any ideas?

Regards, Amit.

barbudor commented 2 years ago

Did you checked/changed your WIfiConfig as suggested a few days ago ? You didn't confirmed.

Sonoff located 4 meters from the repeater shouldn't be so low in reception level, even within a non-metal box, even behind hard walls.

Did you ever considered that your Sonoff could be crapped and should be replaced ? What exactly do you need ? Just a relay control ? If you have reception issues, you should consider a solution with an external antenna instead of a PCB antenna.

amita1974 commented 2 years ago

Hi @barbudor, Thanks for replying.

Yes, I changed WifiConfig to 4 ("Retry"), as suggested and since this did not help I upgraded the FW version the the development branch, and still I am getting too many resets (and flash writes) even now.

Thinking of the current solution design-wise, for the Tasmota: What happens if a house have Sonoff devices in the walls (or in any other not easily accessible locations), and from some reason the WiFi router's credentials are changed, or network connection is lost from any other reason (either with the MQTT server, or with the router)...? In this case - will these Sonoffs go into a loop of resets and flash writes, causing them to reach their end of life soon? An example can be that the Sonoff was installed in a house that is rented and no-one is populating the house for few month, so no internet/WiFi connection is available for a period of time, or that the Sonoff is used to command a socket that is used only in summer (pool pump) and during the whole winter the owner did not know that the Sonoff is trying t o reach WiFi that it's SSDI/credentials was changed. Or bad network signal - like in my case, or any other scenario that will prevent a stable connection (or any connection at all) for a long time.

I think that this Tasmota behavior has to be reconsidered and a different solution should be selected in order to prevent shortening the device's life. I guess that I am not the only one that have devices located in a place without strong signal.

Answering your other questions,

BTW, Currently, there are some times that there is no reset for a long time, and then there are times with many resets. Are we sure that the logged resets are all happening due to bad network connections?