espressif / esp-lwip

Fork of lwIP (https://savannah.nongnu.org/projects/lwip/) with ESP-IDF specific patches
Other
85 stars 129 forks source link

[lwip] Memory leak on TCP IP PCB alloc (IDFGH-8313) #47

Closed KonssnoK closed 2 years ago

KonssnoK commented 2 years ago

Hello, i'm using the latest lwip attached to esp-idf/release_4.4.

For other reasons we had to go through some heavy memory checking lately, and i saw that when using heap standalone debugger https://docs.espressif.com/projects/esp-idf/en/latest/esp32s3/api-reference/system/heap_debug.html

a lot of memp allocations are not released at the end of the trace.

Example:

16 bytes (@ 0x3fcec008) allocated CPU 1 ccount 0xf94bff1c caller 0x420aae97:0x420aaf28
0x420aae97: mem_malloc at C:/src/esp-idf/components/lwip/lwip/src/core/mem.c:237

0x420aaf28: do_memp_malloc_pool at C:/src/esp-idf/components/lwip/lwip/src/core/memp.c:257

16 bytes (@ 0x3fcec030) allocated CPU 1 ccount 0xf98311c4 caller 0x420aae97:0x420aaf28
0x420aae97: mem_malloc at C:/src/esp-idf/components/lwip/lwip/src/core/mem.c:237

0x420aaf28: do_memp_malloc_pool at C:/src/esp-idf/components/lwip/lwip/src/core/memp.c:257

16 bytes (@ 0x3fcec044) allocated CPU 1 ccount 0xf9d38ec4 caller 0x420aae97:0x420aaf28
0x420aae97: mem_malloc at C:/src/esp-idf/components/lwip/lwip/src/core/mem.c:237

0x420aaf28: do_memp_malloc_pool at C:/src/esp-idf/components/lwip/lwip/src/core/memp.c:257

16 bytes (@ 0x3fcf36cc) allocated CPU 1 ccount 0xfc2c05b0 caller 0x420aae97:0x420aaf28
0x420aae97: mem_malloc at C:/src/esp-idf/components/lwip/lwip/src/core/mem.c:237

0x420aaf28: do_memp_malloc_pool at C:/src/esp-idf/components/lwip/lwip/src/core/memp.c:257

This is actually ok for allocation of 16bytes, which are then released later on in the execution. Instead i'm struggling to understand where the allocations of 168B are released. With a debugger i saw that 168B is coming from

pcb = (struct tcp_pcb *)memp_malloc(MEMP_TCP_PCB);

whenever a new connection is made. Looking at my logs, this is never released, which doesn't make much sense.

In order to track it, i changed memp.c as follows:

static uint32_t allocs_count = 0;
static uint32_t allocs[100] = { 0 };

static void *
#if !MEMP_OVERFLOW_CHECK
do_memp_malloc_pool(const struct memp_desc *desc)
#else
do_memp_malloc_pool_fn(const struct memp_desc *desc, const char *file, const int line)
#endif
{
  struct memp *memp;
  SYS_ARCH_DECL_PROTECT(old_level);

#if MEMP_MEM_MALLOC
  memp = (struct memp *)mem_malloc(MEMP_SIZE + MEMP_ALIGN_SIZE(desc->size));

  if(desc->size == 168){
    printf("MEMP ALLOC 0x%08X size %d\n", (uint32_t)memp, desc->size);
    allocs[allocs_count++] = (uint32_t)memp;
  }
  SYS_ARCH_PROTECT(old_level);
static void
do_memp_free_pool(const struct memp_desc *desc, void *mem)
{
  struct memp *memp;
  SYS_ARCH_DECL_PROTECT(old_level);

  LWIP_ASSERT("memp_free: mem properly aligned",
              ((mem_ptr_t)mem % MEM_ALIGNMENT) == 0);

  /* cast through void* to get rid of alignment warnings */
  memp = (struct memp *)(void *)((u8_t *)mem - MEMP_SIZE);

  SYS_ARCH_PROTECT(old_level);

#if MEMP_OVERFLOW_CHECK == 1
  memp_overflow_check_element(memp, desc);
#endif /* MEMP_OVERFLOW_CHECK */

#if MEMP_STATS
  desc->stats->used--;
#endif

#if MEMP_MEM_MALLOC
  LWIP_UNUSED_ARG(desc);
  SYS_ARCH_UNPROTECT(old_level);
  bool found = false;
  for(int i = 0; i < allocs_count; ++i) {
    if(allocs[i] == (uint32_t)memp) {
      memmove(&allocs[i], &allocs[i+1], (allocs_count - (i+1))*sizeof(uint32_t));
      allocs_count--;
      found = true;
    }
  }
  if(found){
    printf("MEMP FREE 0x%08X count %d\n", (uint32_t)memp, allocs_count);
  }
  mem_free(memp);

In the logs, trying to download a nonexistent file from a website, i get only allocs

image

And this is the stacktrace of the allocation image

Any help?

KonssnoK commented 2 years ago

Apparently what happens is that the code reaches the function

static err_t
tcp_close_shutdown_fin(struct tcp_pcb *pcb)

Then it's put in WAIT_FIN_1.

Then it receives the ACK and goes into tcp_tw_pcbs , which seems to be a list of inactive PCBs to be reused.

This means that doing new connections sporadically will fragment the memory, because it will not be deallocated 😱

david-cermak commented 2 years ago

Hi @KonssnoK

Yes, there's some one-time allocation struct for each socket that never gets freed. You can go over all sockets to open and close it after startup. This is what we do in IDF test utilities:

https://github.com/espressif/esp-idf/blob/5b39159e668d3af5832cedb500ccc4869bf15492/tools/unit-test-app/components/test_utils/test_utils.c#L32-L41

KonssnoK commented 2 years ago

uhm ok thanks! i'll take a look and maybe put the same somewhere in our initialization.