1technophile / OpenMQTTGateway

MQTT gateway for ESP8266 or ESP32 with bidirectional 433mhz/315mhz/868mhz, Infrared communications, BLE, Bluetooth, beacons detection, mi flora, mi jia, LYWSD02, LYWSD03MMC, Mi Scale, TPMS, BBQ thermometer compatibility & LoRa.
https://docs.openmqttgateway.com
GNU General Public License v3.0
3.56k stars 785 forks source link

Several rtl_433 sensors are sporadically broadcasting a spike of nonsensical temperature or battery data after upgrade to v175 #2014

Closed puterboy closed 3 weeks ago

puterboy commented 1 month ago

I have a bunch of cheap LaCrosse, AmbientWeather, and Nexus temperature & humidity sensors that I read using a LILYGO LORA 433 esp32 running OpenMQTT Gateway compiled under the lilygo-rtl_433 environment.

Everything was working properly until I upgraded to v175. I started noticing that multiple of the sensors would sporadically (without any seeming pattern) transmit erroneous MQTT data sensor data.

For example, my 4 older LaCrosses sensors randomly broadcast a temperature of 2 degC (33.8 degF) regardless of the actual room temperature (note the erroneous value is always exactly 2 deg C). The newer ones are a different model and seem to be OK

Similarly, the battery level would go from a normal 100.0 (corresponding to 100%) to a non-sensical but value that seems different for each sensor (vs. temperature where 33.8 was always the error value). Examples include: 21385.0, 20989.0, 12079.0, 21682.0, 8218.0, 2575.0

In both cases, the erroneous spikes last for only a single reading before returning to normal and then potentially spiking again in an hour or two.

I can't see any pattern in the timing of the spikes or even which sensors are spiking (some of them seem to not be spiking or it could just be that I haven't observed them long enough since the update)

Reverting to the prior version (development branch with last commit on 3/5/23) removes this issue -- so it very much seems to be a problem with the latest code and not with my setup or sensors.

To Reproduce Steps to reproduce the behavior:

  1. Install v175 on a Lilygo LORA 433mHz board with the lilygo-rtl_433 build environment
  2. Observe the temperature or battery values of a LaCrosse tx141thbv2 sensor for sudden nonsensical spikes

Environment (please complete the following information): As above

puterboy commented 3 weeks ago

The version that works (dev branch from March 2024) interestingly uses a newer version of the library rtl_433_ESP (rtl_433_ESP.git#v0.3.2 than v175 (rtl_433_ESP.git#v0.3.1).

The only seemingly significant differences (beyond spelling corrections in comments) are as follows:

--- OpenMQTTGateway-old/.pio/libdeps/lilygo-rtl_433-jjk/rtl_433_ESP/src/rtl_433/r_api.c 2024-03-13 23:51:45.000000000 -0400
+++ OpenMQTTGateway-release/.pio/libdeps/lilygo-rtl_433-jjk-ota/rtl_433_ESP/src/rtl_433/r_api.c 2024-08-18 18:35:00.104503500 -0400
@@ -798,12 +798,10 @@
       else if ((d->type == DATA_DOUBLE) &&
                (str_endswith(d->key, "_in") || str_endswith(d->key, "_inch"))) {
         d->value.v_dbl = inch2mm(d->value.v_dbl);
-        // need to free ptr returned from str_replace
-        char* new_label1 = str_replace(d->key, "_inch", "_in");
-        char* new_label2 = str_replace(new_label1, "_in", "_mm");
-        free(new_label1);
+        char* new_label =
+            str_replace(str_replace(d->key, "_inch", "_in"), "_in", "_mm");
         free(d->key);
-        d->key = new_label2;
+        d->key = new_label;
         char* new_format_label = str_replace(d->format, "in", "mm");
         free(d->format);
         d->format = new_format_label;

And

--- OpenMQTTGateway-old/.pio/libdeps/lilygo-rtl_433-jjk/rtl_433_ESP/src/rtl_433_ESP.cpp 2024-03-13 23:51:45.000000000 -0400
+++ OpenMQTTGateway-release/.pio/libdeps/lilygo-rtl_433-jjk-ota/rtl_433_ESP/src/rtl_433_ESP.cpp 2024-08-18 18:35:00.104503500 -0400
@@ -32,12 +32,8 @@
 #if defined(RF_MODULE_SCK) && defined(RF_MODULE_MISO) && \
     defined(RF_MODULE_MOSI) && defined(RF_MODULE_CS)
 #  include <SPI.h>
-#  if CONFIG_IDF_TARGET_ESP32C3 || CONFIG_IDF_TARGET_ESP32S3
-SPIClass newSPI(FSPI);
-#  else
 SPIClass newSPI(VSPI);
 #  endif
-#endif

 #ifdef RF_SX1276
 SX1276 radio = RADIO_LIB_MODULE;
@@ -59,14 +55,14 @@

 /*----------------------------- rtl_433_ESP Internals -----------------------------*/

-#define rtl_433_ReceiverTask_Stack    2048
+#define rtl_433_ReceiverTask_Stack    2000
 #define rtl_433_ReceiverTask_Priority 2
 #define rtl_433_ReceiverTask_Core     0

-/*----------------------------- Initialize variables -----------------------------*/
+/*----------------------------- Initalize variables -----------------------------*/

 /**
- * Is the receiver currently receiving a signal
+ * Is the receiver currently receving a signal
  */
 static bool receiveMode = false;

Could the change in ReceiverTask_Stack be causing the problem due perhaps to overflow??? After all I do have a couple of dozen sensors...

1technophile commented 3 weeks ago

It could be interesting to increase your receiver task Stack and see if you eliminate the message corruption.

puterboy commented 3 weeks ago

I can try increasing the ReceiverTask_Stack but as per the above mention, it seems like the change in ZgatewayRTL_433.ino that fixes https://github.com/1technophile/OpenMQTTGateway/issues/2012, causes these sporadic spikes to occur.

Specifically, if you are on the development branch with

    //RFrtl_433_ESPdata["origin"] = (char*)topic.c_str();
    //handleJsonEnqueue(RFrtl_433_ESPdata);
    pub(topic.c_str(), RFrtl_433_ESPdata);

Then I don't get sporadic corruption of the actual MQTT message data but instead get occasional corruption of the Discovery config data.

Conversely, if I use the version from the v175 branch


    RFrtl_433_ESPdata["origin"] = (char*)topic.c_str();
    handleJsonEnqueue(RFrtl_433_ESPdata);
\\
Then the Discovery config lines are not corrupted but I get sporadic corruption of the MQTT data messages.
1technophile commented 3 weeks ago
//RFrtl_433_ESPdata["origin"] = (char*)topic.c_str();

//handleJsonEnqueue(RFrtl_433_ESPdata); pub(topic.c_str(), RFrtl_433_ESPdata);

Interesting, I introduced this change and fixed data corruption for numerous people using RTL_433 https://github.com/1technophile/OpenMQTTGateway/issues/1836

puterboy commented 3 weeks ago

It does indeed fix data corruption but somehow corrupts discovery topics... Not sure why yet. I'm thinking problem may be more fundamental...

On August 25, 2024 4:30:03 PM EDT, Florian @.***> wrote:

//RFrtl_433_ESPdata["origin"] = (char*)topic.c_str();

//handleJsonEnqueue(RFrtl_433_ESPdata); pub(topic.c_str(), RFrtl_433_ESPdata);

Interesting, I introduced this change and fixed data corruption for numerous people using RTL_433 https://github.com/1technophile/OpenMQTTGateway/issues/1836

-- Reply to this email directly or view it on GitHub: https://github.com/1technophile/OpenMQTTGateway/issues/2014#issuecomment-2308983399 You are receiving this because you authored the thread.

Message ID: @.***>

Sent from my Pixel5a with K-9 Mail

puterboy commented 3 weeks ago

I subscribed to the topics and the nonsensical temperature or battery spikes correspond to corrupted JSON data. For example:

home/OMG_lilygo_rtl_433_ESP/RTL_433toMQTT/LaCrosse-TX141THBv2/0/216 {"model":"Nexus-TH","battery_ok":216,"tery_ok":0,"temperature_C":1,"_C":2.8,"otocol":10,"xus, FreeTec NC-7345, NX-3980, Solight TE82S, TFA 30.3209 temperature/humidity sensor":"FreeTec NC-7345, NX-3980, Solight TE82S, TFA 30.3209 temperature/humidity sensor","eTec NC-7345, NX-3980, Solight TE82S, TFA 30.3209 temperature/humidity sensor":" NC-7345, NX-3980, Solight TE82S, TFA 30.3209 temperature/humidity sensor","7345, NX-3980, Solight TE82S, TFA 30.3209 temperature/humidity sensor":"3980, Solight TE82S, TFA 30.3209 temperature/humidity sensor","dity sensor\",\"rssi\":-64,\"duration\":997973}":-64,"sensor\",\"rssi\":-64,\"duration\":997973}":997973}

The data should look something like:

home/OMG_lilygo_rtl_433_ESP/RTL_433toMQTT/LaCrosse-TX141THBv2/0/216 {"model":"LaCrosse-TX141THBv2","id":216,"channel":0,"battery_ok":1,"temperature_C":3.2,"humidity":10,"test":"No","mic":"CRC","protocol":"LaCrosse TX141-Bv2, TX141TH-Bv2, TX141-Bv3, TX141W, TX145wsdth, (TFA, ORIA) sensor","rssi":-65,"duration":141000}

My guess, again, is that this is due to improper string allocation leading to improper string termination...

Again this corruption occurs when you use the code from v175

  RFrtl_433_ESPdata["origin"] = (char*)topic.c_str();
  handleJsonEnqueue(RFrtl_433_ESPdata);
1technophile commented 3 weeks ago

Could be a concurrency issue in the queue, discovery topics are quite heavy in terms of number and length and having them mixed in the queue with the regular messages may be the issue.

puterboy commented 3 weeks ago

I actually think that is the root cause of both the corruption of discovery topics and the sporadic corruption of the MQTT topic messages -- they both seem to occur when the queue is "overloaded". Perhaps not enough memory is being allocated for the queue or for its individual elements. That would explain a lot...

That might also explain the crashes that you fixed by moving publishing of data out of the queue https://github.com/1technophile/OpenMQTTGateway/issues/1836.

I think the queue is a good thing, and we should fix that rather than trying to work around it.

puterboy commented 3 weeks ago

BTW, happy to jump on a call or chat to work on this together as I am coming up to the limits of my abilities here...

puterboy commented 3 weeks ago

I'm stumped here as I can't seem to find a reason why queue would corrupt or overflow:

  1. the JSON doc buffer is allocated JSON_MSG_BUFFER bytes which for an esp32 is 816 and I would think would be more than enough for any of the MQTT topic or discovery config messages I have seen on my system, assuming you get approximately 1 character per byt
  2. The jsonQueue simply discards (and thus ignores) data if queue length is exceeded so too many messages that shouldn't lead to corruption -- at most one would get missed discovery config and MQTT topic messges
  3. The queue uses std::queue which should take care of allocating and deallocating memory for the queue elements
  4. According to the info page on the WebUI, there seems to always be enough free memory so it doesn't seem like an OOM situation.

So maybe it is a concurrency issue since this seems to be multi-threaded (though I literally know nothing about how to program multiple threads). Could it be that there are multiple simultaneous calls to mqtt->publish that collide? At least for discovery topics this doesn't seem to be protected with semaphores (if I am understanding this correctly)

Any thoughts on how to troubleshoot this?

puterboy commented 3 weeks ago

Indeed it is a concurrency issue. I was able to get rid of discovery config data corruption by wrapping mqtt->publish with a Mutex semaphore within the low level pubMQTT routine.

This may also solve the MQTT topic corruption issues and allow you to revert the change referenced above.

I will test and publish a PR.

(BTW, per the other bug report https://github.com/1technophile/OpenMQTTGateway/issues/2023, I think it's still separately important to back off on the OLED display delay as the queue fills)

puterboy commented 3 weeks ago

Unfortunately, I still get MQTT topic corruption if I use the JsonEnqueue method -- not sure why. So I will submit the PR of just the Mutex wrapper, keeping your revision intact.

Technically, that for me at least seems to get the development branch working fine, though I would still like to understand why the JsonEnqueue method leads to corruption.

puterboy commented 3 weeks ago

I was hoping that the 2 PRs I created to add protection to mqtt->publish/mqtt->loop (https://github.com/1technophile/OpenMQTTGateway/pull/2024) and to emptyQueue (https://github.com/1technophile/OpenMQTTGateway/pull/2025) would solve the corruption problem when jsonQueue is used for MQTT messages but I still get sporadic corruption in the posted MQTT messages which would seem to be concurrency issues too but I can't figure out where the issue may be.

Here are 2 successive examples of corruption for an Ambientweather and a Rubicson temp/humidity sensor sandwiched between 2 valid messages from a Nexus and LaCrosse sensor respectively, that I captured with mosquitto_sub:

home/OMG_lilygo_rtl_433_ESP/RTL_433toMQTT/Nexus-TH/1/180 {"model":"Nexus-TH","id":180,"channel":1,"battery_ok":1,"temperature_C":21.6,"humidity":70,"protocol":"Nexus, FreeTec NC-73
45, NX-3980, Solight TE82S, TFA 30.3209 temperature/humidity sensor","rssi":-59,"duration":729997}
home/OMG_lilygo_rtl_433_ESP/RTL_433toMQTT/Ambientweather-F007TH/3/179 {"model":"Solight-TE44","l":179,"emperature_C":3,"re_C":1,"C":19.22222,"ight TE44/TE66, EMOS E0107T, NX-6876-9
17":74,"/TE66, EMOS E0107T, NX-6876-917":"6, EMOS E0107T, NX-6876-917","MOS E0107T, NX-6876-917":"T, NX-6876-917","sor\",\"rssi\":-92,\"duration\":1149996}":-92,"\"rssi\":-92,\"dur
ation\":1149996}":1149996}
home/OMG_lilygo_rtl_433_ESP/RTL_433toMQTT/Rubicson-Temperature/1/222 {"model":"Solight-TE44","el":222,"temperature_C":1,"ure_C":1,"RC":18.3,"light TE44/TE66, EMOS E0107T, NX-6876-917":"t TE44/TE66, EMOS E0107T, NX-6876-917","44/TE66, EMOS E0107T, NX-6876-917":"EMOS E0107T, NX-6876-917",":-92,\"duration\":1149996}":-92,"\"duration\":1149996}":1149996}
home/OMG_lilygo_rtl_433_ESP/RTL_433toMQTT/LaCrosse-TX141THBv2/1/26 {"model":"LaCrosse-TX141THBv2","id":26,"channel":1,"battery_ok":1,"temperature_C":-21.4,"humidity":10,"test":"No","mic":"CRC","protocol":"LaCrosse TX141-Bv2, TX141TH-Bv2, TX141-Bv3, TX141W, TX145wsdth, (TFA, ORIA) sensor","rssi":-59,"duration":136996}

Interestingly, all the Rubicson messages are corrupted but only very occasional ones from other sensors like the Ambientweather one shown here or from the Nexus or LaCrosse sensors that are not corrupted here.

Note: the corruptions occur when I use handleJsonEnqueue to post MQTT messages (rather than the updated version that posts directly using 'pub' -- because I want to fix the underlying corruption problem that seems to exist in the code for the jsonQueue stack.

I should note that the max queue length achieved is 4 and there are no blocked messages.

1technophile commented 3 weeks ago

Try to increase the task stack associated with RTL_433 rtl_433_Decoder_Stack, if I recall well this helped during my testing. We also added a mutex but went to the same conclusion as you.

puterboy commented 3 weeks ago

Maybe I am missing something, but I don't see how increasing rtl_433_Decoder_Stack will help. The problem seems to be limited to how the JSON string is published.

So it would seem that the decoding is just fine -- it's an MQTT publishing issue.

Even if increasing rtl_433_Decoder_Stack somehow helps, wouldn't it simply be papering over the underlying problem with jsonQueue?

1technophile commented 3 weeks ago

It's worth a try considering your message-heavy setup. If it helps, at least it gives a direction towards memory allocation versus concurrency.

puterboy commented 3 weeks ago

I added some more logging to show exactly what is being enqueued and dequeued. The following log shows that the right data is entering the queue but (sometimes) it gets corrupted when there are 2 enqueues in a row.


N: type: null
N: Enqueue JSON: {"model":"LaCrosse-TX141THBv2","id":26,"channel":1,"battery_ok":1,"temperature_C":-20.8,"humidity":10,"test":"No","mic":"CRC","protocol":"LaCrosse TX141-Bv2, TX141TH-Bv2, TX141-Bv3, TX141W, TX145wsdth, (TFA, ORIA) sensor","rssi":-57,"duration":138996,"origin":"/RTL_433toMQTT/LaCrosse-TX141THBv2/1/26"}
N: Dequeue JSON: {"model":"LaCrosse-TX141THBv2","id":26,"channel":1,"battery_ok":1,"temperature_C":-20.8,"humidity":10,"test":"No","mic":"CRC","protocol":"LaCrosse TX141-Bv2, TX141TH-Bv2, TX141-Bv3, TX141W, TX145wsdth, (TFA, ORIA) sensor","rssi":-57,"duration":138996,"origin":"/RTL_433toMQTT/LaCrosse-TX141THBv2/1/26"}
N: Send on /RTL_433toMQTT/LaCrosse-TX141THBv2/1/26 msg {"model":"LaCrosse-TX141THBv2","id":26,"channel":1,"battery_ok":1,"temperature_C":-20.8,"humidity":10,"test":"No","mic":"CRC","protocol":"LaCrosse TX141-Bv2, TX141TH-Bv2, TX141-Bv3, TX141W, TX145wsdth, (TFA, ORIA) sensor","rssi":-57,"duration":138996}
N: type: null

N: type: null
N: Enqueue JSON: {"model":"Ambientweather-F007TH","id":178,"channel":2,"battery_ok":1,"temperature_C":22.94445,"humidity":74,"mic":"CRC","protocol":"Ambient Weather F007TH, TFA 30.3208.02, SwitchDocLabs F016TH temperature sensor","rssi":-57,"duration":188996,"origin":"/RTL_433toMQTT/Ambientweather-F007TH/2/178"}
N: Dequeue JSON: {"model":"Ambientweather-F007TH","id":178,"channel":2,"battery_ok":1,"temperature_C":22.94445,"humidity":74,"mic":"CRC","protocol":"Ambient Weather F007TH, TFA 30.3208.02, SwitchDocLabs F016TH temperature sensor","rssi":-57,"duration":188996,"origin":"/RTL_433toMQTT/Ambientweather-F007TH/2/178"}
N: Send on /RTL_433toMQTT/Ambientweather-F007TH/2/178 msg {"model":"Ambientweather-F007TH","id":178,"channel":2,"battery_ok":1,"temperature_C":22.94445,"humidity":74,"mic":"CRC","protocol":"Ambient Weather F007TH, TFA 30.3208.02, SwitchDocLabs F016TH temperature sensor","rssi":-57,"duration":188996}

N: type: null
N: Enqueue JSON: {"model":"LaCrosse-TX141THBv2","id":83,"channel":0,"battery_ok":1,"temperature_C":4.2,"humidity":71,"test":"No","mic":"CRC","protocol":"LaCrosse TX141-Bv2, TX141TH-Bv2, TX141-Bv3, TX141W, TX145wsdth, (TFA, ORIA) sensor","rssi":-70,"duration":906996,"origin":"/RTL_433toMQTT/LaCrosse-TX141THBv2/0/83"}
N: type: null
N: Enqueue JSON: {"model":"Nexus-TH","id":13,"channel":2,"battery_ok":1,"temperature_C":24.1,"humidity":67,"protocol":"Nexus, FreeTec NC-7345, NX-3980, Solight TE82S, TFA 30.3209 temperature/humidity sensor","rssi":-70,"duration":906996,"origin":"/RTL_433toMQTT/Nexus-TH/2/13"}
N: Dequeue JSON: {"model":"Nexus-TH","battery_ok":83,"tery_ok":0,"temperature_C":1,"_C":4.2,"otocol":71,"xus, FreeTec NC-7345, NX-3980, Solight TE82S, TFA 30.3209 temperature/humidity sensor":"FreeTec NC-7345, NX-3980, Solight TE82S, TFA 30.3209 temperature/humidity sensor","eTec NC-7345, NX-3980, Solight TE82S, TFA 30.3209 temperature/humidity sensor":" NC-7345, NX-3980, Solight TE82S, TFA 30.3209 temperature/humidity sensor","7345, NX-3980, Solight TE82S, TFA 30.3209 temperature/humidity sensor":"3980, Solight TE82S, TFA 30.3209 temperature/humidity sensor","ity sensor\",\"rssi\":-70,\"duration\":906996}":-70,"ensor\",\"rssi\":-70,\"duration\":906996}":906996,"origin":"/RTL_433toMQTT/LaCrosse-TX141THBv2/0/83"}
N: Send on /RTL_433toMQTT/LaCrosse-TX141THBv2/0/83 msg {"model":"Nexus-TH","battery_ok":83,"tery_ok":0,"temperature_C":1,"_C":4.2,"otocol":71,"xus, FreeTec NC-7345, NX-3980, Solight TE82S, TFA 30.3209 temperature/humidity sensor":"FreeTec NC-7345, NX-3980, Solight TE82S, TFA 30.3209 temperature/humidity sensor","eTec NC-7345, NX-3980, Solight TE82S, TFA 30.3209 temperature/humidity sensor":" NC-7345, NX-3980, Solight TE82S, TFA 30.3209 temperature/humidity sensor","7345, NX-3980, Solight TE82S, TFA 30.3209 temperature/humidity sensor":"3980, Solight TE82S, TFA 30.3209 temperature/humidity sensor","ity sensor\",\"rssi\":-70,\"duration\":906996}":-70,"ensor\",\"rssi\":-70,\"duration\":906996}":906996}
N: type: null
N: Dequeue JSON: {"model":"LaCrosse-TX141THBv2","TX141THBv2":13,"41THBv2":2,"id":1,"battery_ok":24.1,"perature_C":67,"C":"y","h, (TFA, ORIA) sensor":-70,"FA, ORIA) sensor":906996,"origin":"/RTL_433toMQTT/Nexus-TH/2/13"}
N: Send on /RTL_433toMQTT/Nexus-TH/2/13 msg {"model":"LaCrosse-TX141THBv2","TX141THBv2":13,"41THBv2":2,"id":1,"battery_ok":24.1,"perature_C":67,"C":"y","h, (TFA, ORIA) sensor":-70,"FA, ORIA) sensor":906996}

N: type: null
N: Enqueue JSON: {"model":"Ambientweather-F007TH","id":13,"channel":1,"battery_ok":1,"temperature_C":25.5,"humidity":69,"mic":"CRC","protocol":"Ambient Weather F007TH, TFA 30.3208.02, SwitchDocLabs F016TH temperature sensor","rssi":-60,"duration":188996,"origin":"/RTL_433toMQTT/Ambientweather-F007TH/1/13"}
N: Dequeue JSON: {"model":"Ambientweather-F007TH","id":13,"channel":1,"battery_ok":1,"temperature_C":25.5,"humidity":69,"mic":"CRC","protocol":"Ambient Weather F007TH, TFA 30.3208.02, SwitchDocLabs F016TH temperature sensor","rssi":-60,"duration":188996,"origin":"/RTL_433toMQTT/Ambientweather-F007TH/1/13"}
N: Send on /RTL_433toMQTT/Ambientweather-F007TH/1/13 msg {"model":"Ambientweather-F007TH","id":13,"channel":1,"battery_ok":1,"temperature_C":25.5,"humidity":69,"mic":"CRC","protocol":"Ambient Weather F007TH, TFA 30.3208.02, SwitchDocLabs F016TH temperature sensor","rssi":-60,"duration":188996}

The first two stanzas show a single enqueue followed by a single dequeue -- here the queue length is just 1. Then two enqueues come along followed by 2 dequeues -- both dequeues seem to be scrambled versions of elements of both eqnueues.

Then comes a single enqueue followed by a normal dequeue.

Does this help?

puterboy commented 3 weeks ago

Now here is an example where it goes wrong with just one entry in the queue (the middle stanza is the corrupted one):

N: Enqueue JSON: {"model":"Nexus-TH","id":180,"channel":1,"battery_ok":1,"temperature_C":22.9,"humidity":70,"protocol":"Nexus, FreeTec NC-7345, NX-3980, Solight TE82S, TFA 30.3209 temperature/humidity sensor","rssi":-61,"duration":739996,"origin":"/RTL_433toMQTT/Nexus-TH/1/180"}
N: Dequeue JSON: {"model":"Nexus-TH","id":180,"channel":1,"battery_ok":1,"temperature_C":22.9,"humidity":70,"protocol":"Nexus, FreeTec NC-7345, NX-3980, Solight TE82S, TFA 30.3209 temperature/humidity sensor","rssi":-61,"duration":739996,"origin":"/RTL_433toMQTT/Nexus-TH/1/180"}
N: Send on /RTL_433toMQTT/Nexus-TH/1/180 msg {"model":"Nexus-TH","id":180,"channel":1,"battery_ok":1,"temperature_C":22.9,"humidity":70,"protocol":"Nexus, FreeTec NC-7345, NX-3980, Solight TE82S, TFA 30.3209 temperature/humidity sensor","rssi":-61,"duration":739996}

N: type: null
N: Enqueue JSON: {"model":"Rubicson-Temperature","id":222,"channel":1,"battery_ok":1,"temperature_C":19.7,"mic":"CRC","protocol":"Rubicson, TFA 30.3197 or InFactory PT-310 Temperature Sensor","rssi":-89,"duration":898997,"origin":"/RTL_433toMQTT/Rubicson-Temperature/1/222"}
N: type: null
N: Dequeue JSON: {"model":"Solight-TE44","el":222,"temperature_C":1,"ure_C":1,"RC":19.7,"light TE44/TE66, EMOS E0107T, NX-6876-917":"t TE44/TE66, EMOS E0107T, NX-6876-917","44/TE66, EMOS E0107T, NX-6876-917":"EMOS E0107T, NX-6876-917",":-89,\"duration\":898997}":-89,"\"duration\":898997}":898997,"origin":"/RTL_433toMQTT/Rubicson-Temperature/1/222"}
N: Send on /RTL_433toMQTT/Rubicson-Temperature/1/222 msg {"model":"Solight-TE44","el":222,"temperature_C":1,"ure_C":1,"RC":19.7,"light TE44/TE66, EMOS E0107T, NX-6876-917":"t TE44/TE66, EMOS E0107T, NX-6876-917","44/TE66, EMOS E0107T, NX-6876-917":"EMOS E0107T, NX-6876-917",":-89,\"duration\":898997}":-89,"\"duration\":898997}":898997}

N: type: null
N: Enqueue JSON: {"model":"LaCrosse-TX141THBv2","id":122,"channel":1,"battery_ok":1,"temperature_C":-19.3,"humidity":78,"test":"No","mic":"CRC","protocol":"LaCrosse TX141-Bv2, TX141TH-Bv2, TX141-Bv3, TX141W, TX145wsdth, (TFA, ORIA) sensor","rssi":-64,"duration":143997,"origin":"/RTL_433toMQTT/LaCrosse-TX141THBv2/1/122"}
N: Dequeue JSON: {"model":"LaCrosse-TX141THBv2","id":122,"channel":1,"battery_ok":1,"temperature_C":-19.3,"humidity":78,"test":"No","mic":"CRC","protocol":"LaCrosse TX141-Bv2, TX141TH-Bv2, TX141-Bv3, TX141W, TX145wsdth, (TFA, ORIA) sensor","rssi":-64,"duration":143997,"origin":"/RTL_433toMQTT/LaCrosse-TX141THBv2/1/122"}
N: Send on /RTL_433toMQTT/LaCrosse-TX141THBv2/1/122 msg {"model":"LaCrosse-TX141THBv2","id":122,"channel":1,"battery_ok":1,"temperature_C":-19.3,"humidity":78,"test":"No","mic":"CRC","protocol":"LaCrosse TX141-Bv2, TX141TH-Bv2, TX141-Bv3, TX141W, TX145wsdth, (TFA, ORIA) sensor","rssi":-64,"duration":143997}
puterboy commented 3 weeks ago

Note that the log dequeue is before I have given back the xQueueMutex that I added to empthyQueue -- both adding and popping from the JsonQueue should be safe.

puterboy commented 3 weeks ago

Actually, I think the problem is with the push, since the following only pushes a shallow copy of jsonDoc onto the stack.

JsonBundle bundle;
  bundle.doc = jsonDoc;
  jsonQueue.push(bundle);

May need to push and pop serialized json docs onto the jsonQueue stack. i.e., serialize -> push -> pop -> deserialize

If this is right then this bug really needs to be fixed since any json doc pushed on the string could be corrupted. I can test this tomorrow

1technophile commented 3 weeks ago

Thanks for the detailled analysis

serialize -> push -> pop -> deserialize

Could be tested for sure.

puterboy commented 3 weeks ago

Tested and solves the problem (at least for me) As mentioned in the PR, this really is a critical bug that should be patched ASAP since it can theoretically corrupt any data that is added to the queue -- even if queue length is one -- so long as some other data object is allocated memory that overlaps with the memory of the queue object.

puterboy commented 3 weeks ago

Fixed as per above PR