espressif / esp-matter

Espressif's SDK for Matter
Apache License 2.0
643 stars 150 forks source link

Memory leak issue on esp_route_table_add_route_entry (CON-1058) #850

Closed law-ko closed 1 month ago

law-ko commented 6 months ago

There is a memory leakage issue when Matter is paired and running for a long time.

Checked by adding heap tracing in while loop under app_main.cpp:

     while (1)
    {
        ESP_ERROR_CHECK( heap_trace_start(HEAP_TRACE_LEAKS) );
        printf("HEAP in DRAM: %d\n", heap_caps_get_free_size( MALLOC_CAP_INTERNAL | MALLOC_CAP_8BIT ));

        vTaskDelay(1000 / portTICK_PERIOD_MS);
        ESP_ERROR_CHECK( heap_trace_stop() );
        heap_trace_dump();
    }

Heap tracing log:

====== Heap Trace: 0 records (20 capacity) ======

====== Heap Trace Summary ======

Mode: Heap Trace Leaks

0 bytes 'leaked' in trace (0 allocations)

records: 0 (20 capacity, 20 high water mark)

total allocations: 17

total frees: 39

================================

HEAP in DRAM: 63687

I (172158) ROUTE_HOOK: Received RIO

I (172158) ROUTE_HOOK: prefix FDFA:F4B5:7423:: lifetime 1800

====== Heap Trace: 1 records (20 capacity) ======

    60 bytes (@ 0x3ffafdac, Internal) allocated CPU 1 ccount 0x5ff8d7dc caller 0x4015ec68:0x4015e922

0x4015ec68: esp_route_table_add_route_entry at /home/espuser/esp-git/esp-matter/examples/im/outlet/build/esp-idf/chip/../../../../../../connectedhomeip/connectedhomeip/config/esp32/third_party/connectedhomeip/src/platform/ESP32/route_hook/ESP32RouteTable.c:86
0x4015e922: ra_recv_handler at /home/espuser/esp-git/esp-matter/examples/im/outlet/build/esp-idf/chip/../../../../../../connectedhomeip/connectedhomeip/config/esp32/third_party/connectedhomeip/src/platform/ESP32/route_hook/ESP32RouteHook.c:99
 (inlined by) icmp6_raw_recv_handler at /home/espuser/esp-git/esp-matter/examples/im/outlet/build/esp-idf/chip/../../../../../../connectedhomeip/connectedhomeip/config/esp32/third_party/connectedhomeip/src/platform/ESP32/route_hook/ESP32RouteHook.c:147

====== Heap Trace Summary ======

Mode: Heap Trace Leaks

60 bytes 'leaked' in trace (1 allocations)

records: 1 (20 capacity, 20 high water mark)

total allocations: 1

total frees: 9

================================

HEAP in DRAM: 63623

Environment

This continues to leak after a certain period of time, until all internal DRAM is used. The matter stack is running on top of PSRAM, but should not make a difference as the leak happens on internal DRAM. ...

wqx6 commented 6 months ago

Could you send more log of heap trace? Looks like the heap trace log you sent is not the cause of memory leak. The device will allocate a route entry once it receives a RA message with RIO(Route info option) and delete the entry after timeout. Normally the timeout duration is about 1800 seconds, so this will be treated as memory leak by heap trace.

If there does be some memory leak, I think it could be caused by other reasons.

law-ko commented 6 months ago

@wqx6 Please find the attached log, starting from booting up. The device is already paired and no interactions have been made with the device but the HEAP keeps dropping.

This is based on light example on release/v1.2 branch. log.dev.ttyUSB1.20240226202616.txt

wqx6 commented 6 months ago

I didn't find any memory leak from your log. At the start of the trace log, the free heap is 139800 Bytes (HEAP in DRAM: 139800). And at the end of the trace log, the free heap is 139672bytes (HEAP in DRAM: 139672). Considering that the device will create some route entries and minimal mDNS will also consume some memories when receiving mDNS advertisement packets, this decreasing is reasonable. Will the free heap continue to decrease? Could you please increase the heap trace delay to 1 minute or longer in

vTaskDelay(1000 / portTICK_PERIOD_MS);

and send a log which have an evident heap decreasing again.

law-ko commented 6 months ago

@wqx6 Please see the attached log, the final HEAP dropped to HEAP in DRAM: 139356. It will continue to drop..

log.dev.ttyUSB1.20240226202616.txt.zip

I will change the vTaskDelay to 60000.

law-ko commented 5 months ago

@wqx6 Please note that we are using espressif's mDNS solution instead of the connectedhomeip's minimal mDNS.

Screenshot 2024-03-11 at 12 57 27 PM

Please see the below log:


     8 bytes (@ 0x3ffafdc0, Internal) allocated CPU 1 ccount 0x72fa63f4 caller 0x40098f20:0x4000bbc5

0x40098f20: _malloc_r at /home/espuser/esp/esp-idf/components/newlib/heap.c:45
0x4000bbc5: _malloc_r in ROM

     5 bytes (@ 0x3ffafdd0, Internal) allocated CPU 1 ccount 0x72fa7404 caller 0x40098f20:0x4000bbc5

0x40098f20: _malloc_r at /home/espuser/esp/esp-idf/components/newlib/heap.c:45
0x4000bbc5: _malloc_r in ROM

     8 bytes (@ 0x3ffb5930, Internal) allocated CPU 1 ccount 0x72fa82c4 caller 0x400ec731:0x400ec878

0x400ec731: mdns_service_add_for_host at /home/espuser/esp-git/esp-matter/examples/im/outlet/build/../managed_components/espressif__mdns/mdns.c:5794
0x400ec878: mdns_service_add at /home/espuser/esp-git/esp-matter/examples/im/outlet/build/../managed_components/espressif__mdns/mdns.c:5839

    19 bytes (@ 0x3ffb5a38, Internal) allocated CPU 1 ccount 0x731bd8b8 caller 0x40098f20:0x4000bbc5

0x40098f20: _malloc_r at /home/espuser/esp/esp-idf/components/newlib/heap.c:45
0x4000bbc5: _malloc_r in ROM

     8 bytes (@ 0x3ffd6940, Internal) allocated CPU 0 ccount 0x7c574520 caller 0x400ebd13:0x400948cd

0x400ebd13: _mdns_execute_action at /home/espuser/esp-git/esp-matter/examples/im/outlet/build/../managed_components/espressif__mdns/mdns.c:5040
 (inlined by) _mdns_service_task at /home/espuser/esp-git/esp-matter/examples/im/outlet/build/../managed_components/espressif__mdns/mdns.c:5242
0x400948cd: vPortTaskWrapper at /home/espuser/esp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:164

    60 bytes (@ 0x3ffb0d0c, Internal) allocated CPU 1 ccount 0x313e9ecc caller 0x4015ef64:0x4015ec1e

0x4015ef64: esp_route_table_add_route_entry at /home/espuser/esp-git/esp-matter/examples/im/outlet/build/esp-idf/chip/../../../../../../connectedhomeip/connectedhomeip/config/esp32/third_party/connectedhomeip/src/platform/ESP32/route_hook/ESP32RouteTable.c:86
0x4015ec1e: ra_recv_handler at /home/espuser/esp-git/esp-matter/examples/im/outlet/build/esp-idf/chip/../../../../../../connectedhomeip/connectedhomeip/config/esp32/third_party/connectedhomeip/src/platform/ESP32/route_hook/ESP32RouteHook.c:99
 (inlined by) icmp6_raw_recv_handler at /home/espuser/esp-git/esp-matter/examples/im/outlet/build/esp-idf/chip/../../../../../../connectedhomeip/connectedhomeip/config/esp32/third_party/connectedhomeip/src/platform/ESP32/route_hook/ESP32RouteHook.c:147


We have not seen any memory leak from our code, be note of this 0x4000bbc5: _malloc_r in ROM in the log as well, I think somewhat related to this issue.

wqx6 commented 5 months ago

If you are testing the device with an ecosystem, the free heap will decrease as the hub needs to establish the subscriptions with the end-device and some subscriptions might be evicted during the test since there might be packet lost. And some resources might not be released immediately. However, the free heap will not decrease continuously since the subscriptions per fabric has a maximum number. Did you observe a heap decrease of over 5 KB? If not, I think it is accepted for the test with ecosystem.

dhrishi commented 4 months ago

@law-ko Any more observations you can share? Do you still see the memory leaking

dhrishi commented 1 month ago

Closing this now