Closed puterboy closed 2 months ago
Weird...
Now every time I reboot the esp32 device, it seems to crash and reboot itself every few 15-30 seconds for about 5-10 times until it finally stabilizes. It reboots before publishing any messages except for a solitary offline
under the topic "LWT".
Then when it finally reboots and stabilizes it remains in the offline
state until I manually publish an online
message to the LWT topic.
Nothing really changed in the code, my network or HA other than me recompiling and flashing the esp32 gateway a bunch of times...
Problem seems to be caused by the Mutex semaphore I wrapped around MQTT.
Wrapping around mqtt->publish
in PubMQTT
works fine -- and indeed fixes the problem of corrupted discovery config messages.
void pubMQTT(const char* topic, const char* payload, bool retainFlag) {
if (SYSConfig.XtoMQTT && !SYSConfig.offline) {
#ifdef ESP32
if (xSemaphoreTake(xMqttMutex, pdMS_TO_TICKS(QueueSemaphoreTimeOutTask)) == pdFALSE) {
Log.error(F("xMqttMutex not taken" CR));
return;
}
#endif
if (mqtt && mqtt->connected()) {
SendReceiveIndicatorON();
Log.trace(F("[ OMG->MQTT ] topic: %s msg: %s " CR), topic, payload);
mqtt->publish(topic, payload, 0, retainFlag);
} else {
Log.warning(F("MQTT not connected, aborting the publication" CR));
}
#ifdef ESP32
xSemaphoreGive(xMqttMutex);
#endif
} else {
Log.notice(F("[ OMG->MQTT deactivated or offline] topic: %s msg: %s " CR), topic, payload);
}
}
However, if I then also similarly wrap mqtt->loop()
as follows:
#ifdef ESP32
if (xSemaphoreTake(xMqttMutex, pdMS_TO_TICKS(QueueSemaphoreTimeOutTask)) == pdTRUE) {
mqtt->loop();
xSemaphoreGive(xMqttMutex);
} else {
Log.error(F("xMqttMutex not taken" CR));
}
#else
mqtt->loop();
#endif
Then it frequently but not always repeatedly crashes on booting as follows:
N: ************** Setup OpenMQTTGateway end **************
N: Reconfiguring MQTT client...
N: Connected to broker
assert failed: vTaskPriorityDisinheritAfterTimeout tasks.c:4922 (pxTCB != pxCurrentTCB[xPortGetCoreID()])
Backtrace: 0x40083c6d:0x3ffb2480 0x4008dd1d:0x3ffb24a0 0x40093ac9:0x3ffb24c0 0x400906b7:0x3ffb25f0 0x4008ee9a:0x3ffb2610 0x400d70e5:0x3ffb2650 0x400d7205:0x3ffb2680 0x400e3757:0x3ffb2700 0x400ee9e2:0x3ffb2720 0x400ee9f5:0x3ffb2740 0x400ef626:0x3ffb2760 0x400e9d2f:0x3ffb27d0 0x40137a41:0x3ffb2810
Not sure what is causing this though presumably there is some race condition that allows it sometimes not to crash :) Once it gets through this then it connects to MQTT and is as rock stable as without this.
Any ideas what could be causing this error? If anything I would have thought that adding a Mutex semaphore would increase stability rather than decrease it...
Removing the Mutex around mqtt->loop
fixes the problem -- but not clear why wrapping it causes the crash...
You can add this to your environment to decode the back trace:
monitor_filters = esp32_exception_decoder
I think I figured out the problem.
Basically, the mqtt
object has a property that executes handle_autodiscovery()
which in turn calls pubMQTT
.
So basically you have the main thread, calling mqtt->loop
protected by the xMqttMutex
token which then calls pubMQTT
which also wants to take the token creating an irresolvable conflict.
So, I think I will stick to just protecting pubMQTT
which is necessary to avoid discovery topic message corruption.
And I will consider this closed :)
Please see PR to make this fix: https://github.com/1technophile/OpenMQTTGateway/pull/2034
Completed
This may be more of a HA bug but wanted to describe the issue and post a solution in case anyone else encounters it...
After reflashing my esp32 several times with OMG_lilygo_rtl_433_ESP, the Gateway eemed to get stuck in an "offline" status on HA even though it was otherwise working perfectly fine.
Symptoms included:
home/OMG_lilygo_rtl_433_ESP/LWT
had message 'offline'HOWEVER, all the non-Gateway intrinsic MQTT sensors continued to work perfectly -- ie., the Gateway was working and publishing MQTT messages as usual to my Mosquitto broker (both for the Gateway and for the sensors it monitors), just the Gateway sensors themselves showed up as offline in HA.
This occurred even though:
I was finally able to make it show up online again in HA by manually publishing an 'online' message to the topic
home/OMG_lilygo_rtl_433_ESP/LWT
using MQTT Explorer (one could also usemosquitto_pub
or the MQTT publish service in HA)But wanted to post my experience here in case anyway else has the problem since it puzzled me for several hours...