espressif / arduino-esp32

Arduino core for the ESP32
GNU Lesser General Public License v2.1
13.31k stars 7.36k forks source link

Random tcp_input crash at runtime. #7370

Closed zekageri closed 1 year ago

zekageri commented 1 year ago

Board

ESP32-Wrover-E

Device Description

Plain esp32

Hardware Configuration

SPI,I2C,Ethernet

Version

latest master (checkout manually)

IDE Name

PlatformIO

Operating System

Windows 11

Flash frequency

80mhz

PSRAM enabled

yes

Upload speed

115200

Description

Random crash.

Sketch

Too long to copy. Using async tcp and ETH_Lan8720

Debug Message

Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.

Core  0 register dump:
PC      : 0x401a01bf  PS      : 0x00060830  A0      : 0x801a024b  A1      : 0x3ffb4380
A2      : 0x00000000  A3      : 0x3ffdfa84  A4      : 0x00000000  A5      : 0x3ffca8f0
A6      : 0x3ffca908  A7      : 0x00060023  A8      : 0x80141d34  A9      : 0x3ffb4370
A10     : 0x00000000  A11     : 0x00000006  A12     : 0x00000014  A13     : 0x0000ffff
A14     : 0x00000000  A15     : 0x00000001  SAR     : 0x00000010  EXCCAUSE: 0x0000001c
EXCVADDR: 0x0000001c  LBEG    : 0x4008af3c  LEND    : 0x4008af52  LCOUNT  : 0xffffffff  

Backtrace:0x401a01bc:0x3ffb43800x401a0248:0x3ffb43b0 0x40135941:0x3ffb43d0 0x4013a9be:0x3ffb4440 0x4013ea32:0x3ffb4470 0x4012f4d9:0x3ffb4490

  #0  0x401a01bc:0x3ffb4380 in AsyncServer::_accept(tcp_pcb*, signed char) at lib/AsyncTCP/src/AsyncTCP.cpp:1314
  #1  0x401a0248:0x3ffb43b0 in AsyncServer::_s_accept(void*, tcp_pcb*, signed char) at lib/AsyncTCP/src/AsyncTCP.cpp:1353       
  #2  0x40135941:0x3ffb43d0 in tcp_process at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/lwip/lwip/src/core/tcp_in.c:945
      (inlined by) tcp_input at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/lwip/lwip/src/core/tcp_in.c:438   
  #3  0x4013a9be:0x3ffb4440 in ip4_input at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/lwip/lwip/src/core/ipv4/ip4.c:800
  #4  0x4013ea32:0x3ffb4470 in ethernet_input at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/lwip/lwip/src/netif/ethernet.c:186
  #5  0x4012f4d9:0x3ffb4490 in tcpip_thread_handle_msg at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/lwip/lwip/src/api/tcpip.c:180
      (inlined by) tcpip_thread at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/lwip/lwip/src/api/tcpip.c:154  

ELF file SHA256: 0000000000000000

Rebooting...

Other Steps to Reproduce

No response

I have checked existing issues, online documentation and the Troubleshooting Guide

mrengineer7777 commented 1 year ago

You didn't provide a simple sketch to reproduce the issue, so I doubt the maintainers will be able to help.

Looks the the crash is happening in AsyncTCP, which is a different project: https://github.com/me-no-dev/AsyncTCP. I was not able to match up the line numbers with my copy of AsyncTCP. What version are you using?

Looking at AsyncTCP, I do see a potential crash if malloc() fails:

//Used to switch out from LwIP thread
static int8_t _tcp_accept(void * arg, AsyncClient * client) {
    lwip_event_packet_t * e = (lwip_event_packet_t *)malloc(sizeof(lwip_event_packet_t));
    e->event = LWIP_TCP_ACCEPT;
    e->arg = arg;
    e->accept.client = client;
    if (!_prepend_async_event(&e)) {
        free((void*)(e));
    }
    return ERR_OK;
}

After the malloc there should be a line checking for failure, like this: if(!e) return ERR_MEM;

You can monitor memory usage in the main loop like this:

void loop() {
  delay(1000);

   //Monitor heap usage
   static uint32_t prevheap = 0;
   uint32_t curheap = ESP.getFreeHeap();
   bool IsChanged = false;
   if (prevheap != curheap) {
        static uint32_t heap_highwater_mark = 0;
        static uint32_t heap_lowwater_mark = UINT32_MAX;

        if (curheap < heap_lowwater_mark) {IsChanged = true; heap_lowwater_mark = curheap;} 
        if (curheap > heap_highwater_mark) {IsChanged = true; heap_highwater_mark = curheap;}

        if(IsChanged) {
            uint32_t maxblock = ESP.getMaxAllocHeap();
            log_i("Heap   Free %u   Min %u   Max %u   Contig %u [%i]",
                curheap, heap_lowwater_mark, heap_highwater_mark, maxblock, xPortGetCoreID());
            Serial.println();
        }
        prevheap = curheap;
    }
}
zekageri commented 1 year ago

Yeah, I'm using a tweaked version of the AsyncTCP library but the base is the same. Here is the lib: https://github.com/yubox-node-org/AsyncTCPSock

I modified the following function as you suggested:

static int8_t _tcp_accept(void * arg, AsyncClient * client) {
    lwip_event_packet_t * e = (lwip_event_packet_t *)malloc(sizeof(lwip_event_packet_t));
    if(!e) return ERR_MEM;
    e->event = LWIP_TCP_ACCEPT;
    e->arg = arg;
    e->accept.client = client;
    if (!_prepend_async_event(&e)) {
        free((void*)(e));
    }
    return ERR_OK;
}

waiting for test results. There are a lot of random crashes unfortunatelly which all realted to tcp in some degree. Probably because of AsyncTCP. I will try it and let you know. And thank you for the heap checker function. I appreciate your help!

There are a lot of cases where the malloc-ed memory does not checked. I will put a check to every malloc i found.

VojtechBartoska commented 1 year ago

is this still valid?

zekageri commented 1 year ago

It seems the latest update solved this. But the function modification did not. Anyway I close this and thanks for the help.