bertmelis / espMqttClient

MQTT 3.1.1 client library for the Espressif devices ESP8266 and ESP32 on the Arduino framework.
https://www.emelis.net/espMqttClient/
MIT License
92 stars 21 forks source link

Strange behaviour when reaching memory limit #108

Closed MichaelDvP closed 10 months ago

MichaelDvP commented 10 months ago

I'm not sure if this is really a bug or i understand something wrong. In proddys emsesp-project some users reports freezing which is (maybe) related to the mqtt at low memory when mqtt is disconnected, With old asyncMqttClient we delete old queued messages when getFreeHeap goes below 60k. In espMqttClient there is the EMC_MIN_FREE_MEMORY setting. Discussion about that is here

I've tested with a modifcated https://github.com/bertmelis/espMqttClient/blob/main/examples/notask-esp32/notask-esp32.ino, and added in loop:

if (currentMillis - lastMillis > 10)
    {
        lastMillis = currentMillis;
        static uint32_t messageNo = 0;
        char payload[120];
        sprintf(payload, "MessageNo: %u qwertzuiopüasdfghjklöyxcvbnmqwertzuioasdfghj", ++messageNo);
        uint16_t id = mqttClient->publish("Testmessage", 1, 0, payload);
        Serial.printf("messageNo: %u, publish: %s, heap: %u, maxAlloc: %u\n", messageNo, id > 0 ? "ok" : "fail", ESP.getFreeHeap(), ESP.getMaxAllocHeap());
    }

(BTW: currentMillies should not be static to update it in every loop)

config.h

#define EMC_MIN_FREE_MEMORY 61440
#define EMC_ALLOW_NOT_CONNECTED_PUBLISH 1

helpers.h

#define EMC_GET_FREE_MEMORY() ESP.getFreeHeap()

(BTW: the mem-check for PSRAM is critical, the esp32 default in sdkconfig.h is CONFIG_SPIRAM_MALLOC_ALWAYSINTERNAL=4096, so mqtt publish is mainly allocated in internal ram. Better check both RAM for enough space)

With the above settings on a esp32/4M, nodemcu (no PSRAM) i get an output:

messageNo: 1384, publish: ok, heap: 54860, maxAlloc: 53236
messageNo: 1385, publish: ok, heap: 54796, maxAlloc: 53236
messageNo: 1386, publish: fail, heap: 54796, maxAlloc: 53236
messageNo: 1387, publish: fail, heap: 54796, maxAlloc: 53236

which looks as expected, except the allocation fail is at 53k,not at 60k. with debug messages i see that allocation fails at 60k, but publish gives back a valid message-id and free heap is still going down.

messageNo: 1248, publish: ok, heap: 61576, maxAlloc: 53236
[ 15277][I][Packet.cpp:319] _allocate(): Alloc (l:79)
messageNo: 1249, publish: ok, heap: 61416, maxAlloc: 53236
[ 15289][W][Packet.cpp:305] _allocate(): Packet buffer not allocated: low memory
messageNo: 1250, publish: ok, heap: 61352, maxAlloc: 53236
[ 15302][W][Packet.cpp:305] _allocate(): Packet buffer not allocated: low memory
messageNo: 1251, publish: ok, heap: 61288, maxAlloc: 53236
[ 15315][W][Packet.cpp:305] _allocate(): Packet buffer not allocated: low memory

Some messages later strange things happens:

[ 15562][W][Packet.cpp:305] _allocate(): Packet buffer not allocated: low memory
messageNo: 1271, publish: ok, heap: 59944, maxAlloc: 53236
[ 15600][E][WiFiClient.cpp:268] connect(): socket error on fd 48, errno: 104, "Connection reset by peer"
messageNo: 1272, publish: fail, heap: 59904, maxAlloc: 53236
messageNo: 1273, publish: fail, heap: 59904, maxAlloc: 53236
[ 15615][I][MqttClient.cpp:717] _clearQueue(): clearing queue (clear session: 0)
Disconnected from MQTT: 7.
[ 15629][W][Packet.cpp:305] _allocate(): Packet buffer not allocated: low memory
messageNo: 1274, publish: ok, heap: 61312, maxAlloc: 53236
[ 15642][W][Packet.cpp:305] _allocate(): Packet buffer not allocated: low memory
messageNo: 1275, publish: ok, heap: 61248, maxAlloc: 53236
[ 15655][W][Packet.cpp:305] _allocate(): Packet buffer not allocated: low memory
messageNo: 1276, publish: ok, heap: 61184, maxAlloc: 53236```

and at last, when heap is down at 53k the message-Id is returned as zero and no log messages from the client

messageNo: 1375, publish: ok, heap: 54888, maxAlloc: 53236
[ 16955][W][Packet.cpp:305] _allocate(): Packet buffer not allocated: low memory
messageNo: 1376, publish: ok, heap: 54824, maxAlloc: 53236
messageNo: 1377, publish: ok, heap: 54760, maxAlloc: 53236
messageNo: 1378, publish: ok, heap: 54696, maxAlloc: 53236
messageNo: 1379, publish: fail, heap: 54696, maxAlloc: 53236
messageNo: 1380, publish: fail, heap: 54696, maxAlloc: 53236

Testing the same with a own task is similar, but some with some other events: log_with_own_task.txt

Workaround is not to use EMC_ALLOW_NOT_CONNECTED_PUBLISH, or stop feeding publishes on an own memory limit higher than the setting in the library. Your solution not to allocate mem for the new messages is not ideal, better is to delete the oldest message in the queue and queue the new message. Is there a way to delete the oldest message in queue by a function call? I've added a getQueueSize to check the queued messages, but in case of disconnected mqtt memory is allocated but the outbox does not grow. Is there a better way to read the number of messages in the queue?

uint16_t MqttClient::getQueue() const {
    EMC_SEMAPHORE_TAKE();
    espMqttClientInternals::Outbox<OutgoingPacket>::Iterator it    = _outbox.front();
    uint16_t  count = 0;
    while (it) {
        ++count;
        ++it;
    }
    EMC_SEMAPHORE_GIVE();
    return count;
}
bertmelis commented 10 months ago

This needs further investigation. The memory check is very crude. Of course, there is a slight difference between the memory check and the actual memory afterwards but I definitely not expect it to be 3k.

There might be a bug indeed. A queue item is created here.

    Node* node = new (std::nothrow) Node(std::forward<Args>(args) ...);
    if (node != nullptr) {

The Node constructor is here:

    explicit Node(Args&&... args)
    : data(std::forward<Args>(args) ...)
    , next(nullptr) {
      // empty
    }

Here, the node in the queue can exist but it's data can be non existent. The error is set but the node is not cleared. Later, when trying to send the data, it fails.

Here might be the problem:

    if (it && error == espMqttClientTypes::Error::SUCCESS) return true;
    return false;

ON failure it just returns false instead of cleaning up the node at iterator.

bertmelis commented 10 months ago

Regarding queue management. It's undesired to just delete the oldest item. We have to at least check for QoS. I'm happy to delete qos0 messages. Higher QoS means giving up compliancy.

It can be configurable of course: delete oldest regardless delete oldest, taking qos into account don't delete = don't allow new

In any case, it will probably add a lot of complexity.

MichaelDvP commented 10 months ago

Of course, there is a slight difference between the memory check and the actual memory afterwards but I definitely not expect it to be 3k.

The memory check works, the "low memory" message from allocate comes at right level. But the publish still returns ID (qos>0) or 1 (qos0) and heap is decreasing. 7k below limit the "low memory" messages is missing and publish return zero and heap stays constant.

I have not checked if the queue is send out after reconnect, my tests was always with broker down all the time. Because the outbox seems to stay emty, i'll check if the queued messages are sent on reconnect.

Regarding queue management...

Agree, qos is critical when deleting messages. My suggestion: don't accept the new messages, make sure message-id returns zero, so we know the message is not queued. But add a function to delete front/back of queue the the caller can retry after deleting one.

bertmelis commented 10 months ago

Got it! We'll first do the bug hunting, then discuss features. I'm definitely open for your suggestion.

bertmelis commented 10 months ago

The reference to error isn't properly updated when allocation fails.

The devil is in the details. Major bug found!

https://github.com/bertmelis/espMqttClient/pull/109/commits/24b6b49a3afa522775cc9869eba808ca4ac54d0b

bertmelis commented 10 months ago

Some log output:

[ 12326][I][Packet.cpp:316] _allocate(): Alloc (l:78)
messageNo: 599, publish: ok, heap: 158412, maxAlloc: 98292
[ 12342][W][Packet.cpp:306] _allocate(): Packet buffer not allocated: low memory
[ 12349][E][MqttClient.cpp:157] publish(): Could not create PUBLISH packet
messageNo: 600, publish: fail, heap: 158412, maxAlloc: 98292
[ 12367][W][Packet.cpp:306] _allocate(): Packet buffer not allocated: low memory
[ 12373][E][MqttClient.cpp:157] publish(): Could not create PUBLISH packet
messageNo: 601, publish: fail, heap: 158412, maxAlloc: 98292

It now correctly detects the failed allocation. (I didn't change anything on the psram quirks.)

MichaelDvP commented 10 months ago

Wow this was fast. Looks much better, but there is still something wrong. The outbox queue is now correctly filled and i added to show it: Serial.printf("messageNo: %u, publish: %s, queue: %d, heap: %u, maxAlloc: %u\n", messageNo, id > 0 ? "ok" : "fail", mqttClient->getQueue(), ESP.getFreeHeap(), ESP.getMaxAllocHeap());

This gives when reaching the limit correctly the fail (message-id == 0):

[ 14852][I][Packet.cpp:319] _allocate(): Alloc (l:79)
messageNo: 1248, publish: ok, queue: 1242, heap: 61580, maxAlloc: 53236
[ 14863][I][Packet.cpp:319] _allocate(): Alloc (l:79)
messageNo: 1249, publish: ok, queue: 1243, heap: 61420, maxAlloc: 53236
[ 14874][W][Packet.cpp:305] _allocate(): Packet buffer not allocated: low memory
[ 14874][E][MqttClient.cpp:161] publish(): Could not create PUBLISH packet
messageNo: 1250, publish: fail, queue: 1244, heap: 61356, maxAlloc: 53236
[ 14890][W][Packet.cpp:305] _allocate(): Packet buffer not allocated: low memory
[ 14890][E][MqttClient.cpp:161] publish(): Could not create PUBLISH packet

But still the heap goes down and messages are put to queue. Heap usage of a valid message is 160 bytes, heap usage when allocate fail is 64 bytes. And after some time still strange things happens:

messageNo: 1351, publish: fail, queue: 1346, heap: 54828, maxAlloc: 53236
[ 16942][W][Packet.cpp:305] _allocate(): Packet buffer not allocated: low memory
[ 16942][E][MqttClient.cpp:161] publish(): Could not create PUBLISH packet
messageNo: 1352, publish: fail, queue: 1347, heap: 54764, maxAlloc: 53236
Connecting to MQTT...
Connecting failed.
[ 24617][V][WiFiGeneric.cpp:362] _arduino_event_cb(): STA Disconnected: SSID: HZTD67Fr45d, BSSID: 44:4e:6d:af:6f:8f, Reason: 200
[ 24617][D][WiFiGeneric.cpp:1035] _eventCallback(): Arduino Event: 5 - STA_DISCONNECTED
[ 24625][W][WiFiGeneric.cpp:1057] _eventCallback(): Reason: 200 - BEACON_TIMEOUT
[ 24632][D][WiFiGeneric.cpp:1081] _eventCallback(): WiFi AutoReconnect Running
[ 24640][W][Packet.cpp:305] _allocate(): Packet buffer not allocated: low memory
[ 24646][E][MqttClient.cpp:161] publish(): Could not create PUBLISH packet
[ 24642][V][WiFiGeneric.cpp:97] set_esp_interface_ip(): Configuring Station static IP: 0.0.0.0, MASK: 0.0.0.0, GW: 0.0.0.0
messageNo: 2049, publish: fail, queue: 1350, heap: 55200, maxAlloc: 53236
WiFi disconnected
[ 24678][W][Packet.cpp:305] _allocate(): Packet buffer not allocated: low memory
[ 24679][E][MqttClient.cpp:161] publish(): Could not create PUBLISH packet
messageNo: 2050, publish: fail, queue: 1351, heap: 55320, maxAlloc: 53236
[ 24699][W][Packet.cpp:305] _allocate(): Packet buffer not allocated: low memory
[ 24699][E][MqttClient.cpp:161] publish(): Could not create PUBLISH packet
messageNo: 2051, publish: fail, queue: 1352, heap: 55256, maxAlloc: 53236
[ 24719][W][Packet.cpp:305] _allocate(): Packet buffer not allocated: low memory
[ 24719][E][MqttClient.cpp:161] publish(): Could not create PUBLISH packet
messageNo: 2052, publish: fail, queue: 1353, heap: 55192, maxAlloc: 53236
[ 24739][W][Packet.cpp:305] _allocate(): Packet buffer not allocated: low memory
[ 24739][E][MqttClient.cpp:161] publish(): Could not create PUBLISH packet
messageNo: 2053, publish: fail, queue: 1354, heap: 55128, maxAlloc: 53236
[ 24759][W][Packet.cpp:305] _allocate(): Packet buffer not allocated: low memory
[ 24759][E][MqttClient.cpp:161] publish(): Could not create PUBLISH packet
messageNo: 2054, publish: fail, queue: 1355, heap: 55064, maxAlloc: 53236
[ 24779][W][Packet.cpp:305] _allocate(): Packet buffer not allocated: low memory
[ 24779][E][MqttClient.cpp:161] publish(): Could not create PUBLISH packet
messageNo: 2055, publish: fail, queue: 1356, heap: 55000, maxAlloc: 53236
[ 24799][W][Packet.cpp:305] _allocate(): Packet buffer not allocated: low memory
[ 24800][E][MqttClient.cpp:161] publish(): Could not create PUBLISH packet
messageNo: 2056, publish: fail, queue: 1357, heap: 54936, maxAlloc: 53236
[ 24819][W][Packet.cpp:305] _allocate(): Packet buffer not allocated: low memory
[ 24820][E][MqttClient.cpp:161] publish(): Could not create PUBLISH packet
messageNo: 2057, publish: fail, queue: 1358, heap: 54872, maxAlloc: 53236
[ 24840][W][Packet.cpp:305] _allocate(): Packet buffer not allocated: low memory
[ 24840][E][MqttClient.cpp:161] publish(): Could not create PUBLISH packet
messageNo: 2058, publish: fail, queue: 1359, heap: 54808, maxAlloc: 53236
[ 24860][W][Packet.cpp:305] _allocate(): Packet buffer not allocated: low memory
[ 24860][E][MqttClient.cpp:161] publish(): Could not create PUBLISH packet
messageNo: 2059, publish: fail, queue: 1360, heap: 54744, maxAlloc: 53236
Connecting to MQTT...
Connecting failed.
Connecting to MQTT...
Connecting failed.
Connecting to MQTT...
Connecting failed.
Connecting to MQTT...
Connecting failed.
bertmelis commented 10 months ago

It's not reproducing here. Which version are you using? I've changed two things:

Both changes are in the PR.

MichaelDvP commented 10 months ago

Sorry, i've merged your first commit, but missed the the others. It's working now. Thank you very much.

bertmelis commented 10 months ago

I've changed one more bit: the CONNECT message will not check for low memory. Otherwise, as there is no means to delete from the queue, you will never be able to reconnect when low on memory.

bertmelis commented 10 months ago

Closed with pr #109

Feel free to reopen if the problem persists.

The feature discussion can be done in a separate issue/discussion.