arendst / Tasmota

Alternative firmware for ESP8266 and ESP32 based devices with easy configuration using webUI, OTA updates, automation using timers or rules, expandability and entirely local control over MQTT, HTTP, Serial or KNX. Full documentation at
https://tasmota.github.io/docs
GNU General Public License v3.0
21.84k stars 4.75k forks source link

MI32Option6 1 causes device to lockup requiring power off/on #14655

Closed kabongsteve closed 2 years ago

kabongsteve commented 2 years ago

PROBLEM DESCRIPTION

Running a SONOFF Dual R3 in ShutterMode (SetOption80) as well as Bluetooth and 3 MI LYWSD03MMC Bluetooth sensors (running custom firmware.

When enabling MI32Option6 1 the device runs between 6-12 hours before crashing completely with no response from device. Device must be powered off and on again to restart with no error logs as to reasons.

REQUESTED INFORMATION

Make sure your have performed every step and checked the applicable boxes before submitting your issue. Thank you!

- [ ] If using rules, provide the output of this command: `Backlog Rule1; Rule2; Rule3`:
```lua
  Rules output here:
- [ ] Set `weblog` to 4 and then, when you experience your issue, provide the output of the Console log:
```lua
  Console output here:

TO REPRODUCE

Steps to reproduce the behavior: Enable MI32Option6 1

EXPECTED BEHAVIOUR

A clear and concise description of what you expected to happen. Device will run for under 12 hours then fail completely

SCREENSHOTS

If applicable, add screenshots to help explain your problem.

ADDITIONAL CONTEXT

Add any other context about the problem here. When not enabled system runs fine without crashing

(Please, remember to close the issue when the problem has been addressed)

barbudor commented 2 years ago

Please upgrade to latest dev version from http://ota.tasmota.com/tasmota32/ There have been some changes in the NimBLE library which seems to make ESP32 Bluetooth more reliable. I also had some device lock-up with 10.1.0 in December but no problems since I upgraded early January

kabongsteve commented 2 years ago

Thanks, have updated to latest Development branch, (2022.01.3) and turned on MI32Option6 1

It hasn't crashed, yet, but I am now seeing the following in the console log 16:44:38.784 M32: Kick off readOneSensor 16:44:38.787 M32: Hit tele time, restarted but not finished last - lost from slot 1073562816

The second message above had not shown up previously.

Another question, partially separate. Is is it possible to have it only managing/handling the MI32 devices I want and not doing Bluetooth scans and returning all other bluetooth devices.

Since i'm running 4rd party firmware on the mi tempdevices there is no MI32Keys pairs, as they don't require a decryption key.

barbudor commented 2 years ago

I also have those message. Didn't care but I will look where they coming from. There should be options to disable scan

barbudor commented 2 years ago

Very strange, these messages should be logged in DEBUG but I can see the same even I'm in WebLog 2 I don't have active scan enabled From what I seem to understand in the code, those doesn't imply an active read of the sensor. It's more like "read in the sensor table what we have received since last BLEPeriod and send the mqtt message" I will keep on just ignoring them

arendst commented 2 years ago

The problem is library NimBLR-Arduino.

It redefines the standard Tasmota logging enums. Once I renumbered them but as the library has been updated this renumbering has been restored. I'll renumber waiting for another library updsate screwing up again.

BTW issue is in file lib\libesp32_div\NimBLE-Arduino\src\log_common\log_common.h

barbudor commented 2 years ago

ok. I remember having done also a huge PR on OpenThem library that had conflicting symbols with NimBLE. Not sure if we have a way to track those in order to avoid upgrading blindly a library in the future

Jason2866 commented 2 years ago

@barbudor @arendst Adding a readme_changes.md to the lib would help. This was the case in the audio lib. When i updated the audo lib i saw this file and checked all. Seeing the changes Theo just (again) made, it rung a bell... EDIT: Just did this.

barbudor commented 2 years ago

great idea @Jason2866 👍 I will do the same about my changes in OpenTherm

kabongsteve commented 2 years ago

Tried running the development branch but stiull end up with same issues and esp32 crashes with no recourse but to power off and on again

barbudor commented 2 years ago

@kabongsteve I started a stagging unit with latest dev Let's see

dcoder42 commented 2 years ago

I might have the same issue here. Sometimes the ESP32 is running for days until it happens, other times it happens in less than a day. I'm running a self compiled firmware from a few days ago. I have connected the ESP32 now to my server hopefully to get some logs.

I have also a Sonoff RF bridge and a LedeNet controller running Tasmota 10.0.0.1. After the ESP32 stopped responding i checked the OpenWrt logs and i have found thousands of messages like these:

Sat Feb  5 08:34:47 2022 daemon.info hostapd: wlan1: STA e8:68:e7:XX:XX:XX IEEE 802.11: disassociated
Sat Feb  5 08:34:48 2022 daemon.info hostapd: wlan1: STA e8:68:e7:XX:XX:XX IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
Sat Feb  5 08:34:53 2022 daemon.info hostapd: wlan1: STA e8:68:e7:XX:XX:XX IEEE 802.11: authenticated
Sat Feb  5 08:34:57 2022 daemon.info hostapd: wlan1: STA e8:68:e7:XX:XX:XX IEEE 802.11: disassociated
Sat Feb  5 08:34:58 2022 daemon.info hostapd: wlan1: STA e8:68:e7:XX:XX:XX IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
Sat Feb  5 08:35:01 2022 daemon.notice hostapd: wlan1: STA e8:68:e7:XX:XX:XX IEEE 802.11: did not acknowledge authentication response
Sat Feb  5 08:35:02 2022 daemon.info hostapd: wlan1: STA e8:68:e7:XX:XX:XX IEEE 802.11: associated (aid 3)
Sat Feb  5 08:35:08 2022 daemon.info hostapd: wlan1: STA e8:68:e7:XX:XX:XX IEEE 802.11: disassociated
Sat Feb  5 08:35:08 2022 daemon.info hostapd: wlan1: STA e8:68:e7:XX:XX:XX IEEE 802.11: disassociated
Sat Feb  5 08:35:08 2022 daemon.info hostapd: wlan1: STA e8:68:e7:XX:XX:XX IEEE 802.11: disassociated
Sat Feb  5 08:35:08 2022 daemon.info hostapd: wlan1: STA e8:68:e7:XX:XX:XX IEEE 802.11: disassociated
Sat Feb  5 08:34:50 2022 daemon.info hostapd: wlan1: STA 8c:ce:4e:XX:XX:XX IEEE 802.11: disassociated
Sat Feb  5 08:34:50 2022 daemon.info hostapd: wlan1: STA 8c:ce:4e:XX:XX:XX IEEE 802.11: disassociated
Sat Feb  5 08:34:50 2022 daemon.info hostapd: wlan1: STA 8c:ce:4e:XX:XX:XX IEEE 802.11: disassociated
Sat Feb  5 08:34:51 2022 daemon.info hostapd: wlan1: STA 8c:ce:4e:XX:XX:XX IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
Sat Feb  5 08:34:54 2022 daemon.notice hostapd: wlan1: STA 8c:ce:4e:XX:XX:XX IEEE 802.11: did not acknowledge authentication response
Sat Feb  5 08:35:04 2022 daemon.info hostapd: wlan1: STA 8c:ce:4e:XX:XX:XX IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
Sat Feb  5 08:35:07 2022 daemon.info hostapd: wlan1: STA 8c:ce:4e:XX:XX:XX IEEE 802.11: authenticated

These messages are only present for these two devices. Not for the ESP32. I have updated the OpenWrt packages around the same time i have updated these two devices to 10.0.0.1. The updates for OpenWrt also included updates to the QCA WiFi firmware so this might be related to some OpenWrt issues.

dcoder42 commented 2 years ago

The device became unreachable again. From the logs i could see that it was still up and receiving BLE packets. It just wasn't able to connect to Wifi. This is visible in the logs for the last 6 hours

MQT: Attempting connection...
MQT: Connect failed to 192.168.X.X:x, rc -2. Retry in 10 sec
...
WIF: Connect failed as AP cannot be reached
WIF: Connecting to AP1 X in mode 11n as X...
...
RSL: STATE = {"Time":"2022-02-07T03:57:26","Uptime":"0T09:14:13","UptimeSec":33253,"Heap":87,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":482,"MqttCount":1,"Berry":{"HeapUsed":3,"Objects":39}}
RSL: SENSOR = {"Time":"2022-02-07T03:57:26","ESP32":{"Temperature":52.8},"TempUnit":"C"}
RSL: BLE = {"Time":"2022-02-07T03:57:26","BLEDevices":{"total":7,...}}}
RSL: BLE = {"Time":"2022-02-07T03:57:26","BLE":{"scans":1657,"adverts":153363,"devices":7,"resets":0}}
...
WIF: Connect failed as AP cannot be reached
WIF: Connect failed as AP cannot be reached

The Sonoff RF bridge and the LedeNet controller were both reachable and working fine. I started another ESP32 and this one was also able to connect to Wifi. I restarted the Wifi radio (just this one Wifi radio, not the router itself) and the faulty ESP32 still wasn't able to connect. Any other device connected fine. Replugging the power connection on the faulty ESP32 made it connect again.

In the logs i also found two reboots because of "StoreProhibited" exceptions and one because of "IntegerDivideByZero" but this one did not have any core dump. All of them ~8 hours before the Wifi issues started.

Guru Meditation Error: Core  0 panic'ed (StoreProhibited). Exception was unhandled.

Core  0 register dump:
PC      : 0x4009a8d1  PS      : 0x00060930  A0      : 0x8009a7b1  A1      : 0x3ffd24d0  
A2      : 0x22010003  A3      : 0x00000000  A4      : 0x3ffcb9e8  A5      : 0x00000000  
A6      : 0x00000000  A7      : 0x00060920  A8      : 0x0ec9d700  A9      : 0x0ec9d700  
A10     : 0x0ec9d700  A11     : 0x3ffd24f0  A12     : 0x3ffcb6b8  A13     : 0x0000cdcd  
A14     : 0x00060923  A15     : 0x00060920  SAR     : 0x00000000  EXCCAUSE: 0x0000001d  
EXCVADDR: 0x0ec9d708  LBEG    : 0x40092f74  LEND    : 0x40092f8a  LCOUNT  : 0x00000000  

Backtrace:0x4009a8ce:0x3ffd24d00x4009a7ae:0x3ffd24f0 
Guru Meditation Error: Core  0 panic'ed (StoreProhibited). Exception was unhandled.

Core  0 register dump:
PC      : 0x4009a8d1  PS      : 0x00060b30  A0      : 0x8009a7b1  A1      : 0x3ffd24d0  
A2      : 0x704f3237  A3      : 0x00000000  A4      : 0x3ffcb9e8  A5      : 0x00000000  
A6      : 0x00000000  A7      : 0x00060b20  A8      : 0xc5a2947b  A9      : 0xc5a2947b  
A10     : 0xc5a2947b  A11     : 0x3ffd24f0  A12     : 0x3ffcb6b8  A13     : 0x0000cdcd  
A14     : 0x00060b23  A15     : 0x00060b20  SAR     : 0x00000000  EXCCAUSE: 0x0000001d  
EXCVADDR: 0xc5a29483  LBEG    : 0x40092f74  LEND    : 0x40092f8a  LCOUNT  : 0x00000000  

Backtrace:0x4009a8ce:0x3ffd24d00x4009a7ae:0x3ffd24f0 
MQT: tele/X/INFO3 = {"Info3":{"RestartReason":{"Exception":6,"Reason":"IntegerDivideByZero","EPC":"401ded02","EXCVADDR":"00000000","CallChain":["401decff","4015fa41","40098c6c","7ffe31bd"]}}}
Protoncek commented 2 years ago

Hi guys! I have the same problem, except my esp32 goes offline in a matter of minutes after setting MI32option6 1. I run ESPHome on ESP32, reading xiaomi temp/humidity BLE sensors, tastmota has latest developer FW version. If i set Mi32Option6 on zero then module works fine and temperature shows in HA, but i get constant errors in HA: ERROR (MainThread) [homeassistant.helpers.template] Template variable error: 'dict object' has no attribute 'ATC37fa88' when rendering '{{value_json.ATC37fa88.Temperature }}' The problem can be in the fact that in case of above flag set to zero i get mulitple mqtt messages in one instead of separate for each thermometer (i have 7 of them). That's why i'd rather see Mi32option6 set to on. Any suggestions?

barbudor commented 2 years ago

@Protoncek you don't tell which version you are running

I have a production unit running on a build from Jan 8th I also have a stagging unit running on a test version for a new NimBLE library running since Feb 6th 8 Mi sensors using PVVX firmware of various versions. Both have been running fine

For both I'm using ESP32DevKit with a good reliable 5V1A power supply + a pair of additional 100µF cap on 5V and 3V3 soldered directly on the board

Protoncek commented 2 years ago

EDIT: sorry, my stupid mistake.. i run TASMOTA, not esphome…

If you mean FW in esp32, i currently run 10.1.0.8 dev version, before that i ran latest release v10.1.0. I installed dev version because i thought that problems will dissapear, but they remains. It’s “tasmota-bluetooth.bin” version, because i need BT for my xiaomi’s. Should i install a different one? Which on then? Interesting is that you say yóu run version from january, while latest was released in february 8th.

Jason2866 commented 2 years ago

Nothing is released in February. Changes in development builds are every day.

Protoncek commented 2 years ago

Ah…. Again my mistake. I’m not used to read american dates, we have them different (starting with day, not year), so i was looking to quick… Now i see that relesase version is 12. December 2021, while dev is 11. Feb. 2022. Am i looking at the wrong page? Where did you find january release ? I’m looking here: http://ota.tasmota.com/tasmota32/ And http://ota.tasmota.com/tasmota32/release

Jason2866 commented 2 years ago

The used build from 8. Jan is not a released version too. Just the version from that day of the development branch. The other one (new NimBLE version) is this one https://github.com/Jason2866/Tasmota/tree/NimBLE_S3 PR #14744 You can find older development builds here https://github.com/tasmota/install/tree/main/firmware/development

Protoncek commented 2 years ago

Thanks for your help and links! Yesterday i installed latest developer version, released just after eight.. could it be that this was fixed in this dev version? I have module overnight with this option on and still works.

barbudor commented 2 years ago

If I remember well, it is the upgrade of the Espressif SDK/Arduino Core to 2.0.2 which solved the problem (commit on December 23 which solved the problem). I just didn't noticed the freeze issue before early Jan because I was not at home for the season holliday. So the problem was fixed in dev branch 10.1.0.1 since Dec 23rd. Since that date, there have been a upgrade of NimBLE library to 1.3.5 on Jan 14th (but I'm not running this one)

Jason2866 commented 2 years ago

No issue anymore since core 2.0.2 is used

kabongsteve commented 2 years ago

@Jason2866 Its a shame that the instigator of the problem, who was still having this issue, has no say in whether the issue is still open/resolved or otherwise