Closed puterboy closed 2 months ago
Interestingly, on looking closer some but not all discovery config messages are corrupted. It's hard to find a pattern of which ones are vs. aren't. It's not device specific.
Note: I also noted it on 2 different devices (lilygo-rtl-433 and esp32dev-ble)
Both with development version?
Yes both development
Let me summarize:
Regarding corruption of Wifi Network
under the WebUI, the following ALL SHOW CORRUPTION of the display:
(Note my ssid itself has no weird characters -- it is of form myssid-2.4)
Regarding corruption of the discovery config topic (only tested on lilygo-rtl-433 device)
SO seems like potentially 2 separate bugs...
WiFi Network
display goes back to at least V1.7.0The above is all reproducible going back and forth between versions.
Note the above PR fixes ones of the bugs reported here (I initially thought they were related since they both involved corruption) but now I realize they are separate.
I still haven't figured out why in there is sporadic corruption of the discovery JSON strings in the development branch (but v175 is fine).
Interestingly, whereas before it would generate discovery topics for temperature, humidity, battery and rssi, now it generates all the humidity ones, some of the rssi ones, a couple of the temperature ones, and none of the battery ones..
What could be causing this???? .
Memory/concurrency issues due to the number of devices being processed. Try to play with the stack size for RTL_433, it may reduce/remove those.
Any specific suggestions of what variables to try?
OK - I was able to fix the problem of corrupted discovery data by reverting one of the recent changes to ZgatewayRTL_433.ino
//RFrtl_433_ESPdata["origin"] = (char*)topic.c_str();
//handleJsonEnqueue(RFrtl_433_ESPdata);
pub(topic.c_str(), RFrtl_433_ESPdata);
Changing it back to:
RFrtl_433_ESPdata["origin"] = (char*)topic.c_str();
handleJsonEnqueue(RFrtl_433_ESPdata);
eliminated the corruption of the discovery strings...
HOWEVER reverting this causes the spurious spikes noted in to return https://github.com/1technophile/OpenMQTTGateway/issues/2014
Said another way:
pub
and comment out JSONEnqueue
, then spurious data resolves but discovery strings are corruptedpub
and use JSONEnqueue
, then discovery strings are OK but spurious data returns.Not sure how I can resolve one bug without causing the other to appear and vice-versa**
Note I can understand why perhaps using pub
directly without queueing it could lead to some conflict that would cause corruption of the discovery data.
However, it's unclear to me why reverting to json enqueueuing the discovery data would lead to sporadic corruption of the actual MQTT topic messages -- unless perhaps the discovery messages are too long or consume too much of the queue before being cleared (though I would have thought that would only be an issue after restart since discovery messages are supposedly not republished if unchanged).
Would appreciate some help and insight in debugging this further :) Thanks!
One other thing that may be concerning (but seems to work ok now) is the newly added lines:
char deviceKeyParameter[25];
memcpy(deviceKeyParameter, &pdevice->uniqueId[strlen(pdevice->uniqueId) - strlen(parameters[i][0])], strlen(parameters[i][0]));
deviceKeyParameter[strlen(parameters[i][0])] = '\0';
Log.trace(F("deviceKeyParameter: %s" CR), deviceKeyParameter);
If this is meant to copy the uniqueID, I am concerned that 25 chars may not be enough as I already have some entities with UniqueID names longer than 25 chars, such as Ambientweather-F007TH-2-178-temperature_C
(assuming the definition of uniqueId is the same as what is used in HA core_entity.registry
). Of course if uniqueID has a different meaning here and is just a short numerical or hex string then disregard this concern.
The above PR fixes the second of the 2 issues initially listed in this bug report.
However, I still can't figure out why the handleJsonEnqueue
method causes sporadic corruption in the MQTT topic messages.
I even tried add a Mutex to the emptyQueue()
routine thinking maybe something was being added to the queue while it was being popped and that didn't help.
Rather than bypassing the queue by just publishing directly, it would be helpful to figure out why it is failing...
Describe the bug I just upgraded to the latest dev branch and am finding two potentially related errors: Note: The problem is not due to my discovery_prefix patches since I reverted and still see the problems Note: I also noted it on 2 different devices (lilygo-rtl-433 and esp32dev-ble)
In the WebUI, under "Configure WiFi" the WiFi network name appears as gibberish characters -- this persist even if I re-name the network. Note that the stored name must be working since it attaches to the correct network.
Separately, but perhaps related, looking at the discovery topic config lines (via mosquitto_sub), there is gibberish and over-writing in the value of the 'stat_t' key. For example:
It as if it the components constructing the json string are pointing to the wrong elements of memory since the resulting json string looks like a combination of a valid discovery topic config keys & values PLUSsome elements of a published data topic PLUS some elements of the config string of another entity for the device -- all mashed together.
Note that it should look something like:
To Reproduce Compile latest dev branch
Expected behavior See above
Screenshots See above
Environment (please complete the following information):