espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
12.93k stars 7.1k forks source link

SPI master driver memory free heap assert failed (IDFGH-8619) #10066

Open wishinlife opened 1 year ago

wishinlife commented 1 year ago

Answers checklist.

IDF version.

v4.42

Operating System used.

Linux

How did you build your project?

VS Code IDE

If you are using Windows, please specify command line type.

No response

Development Kit.

ESP32-S3-N8R2

Power Supply used.

USB

What is the expected behavior?

I user WK2124 chip external uart, and start tow task, the first task control onboard uart, the second task control external uart on WK2124. The WK2124 is link on SPI3. The first round of sending and receiving data was normal, and the chip was abort during the second round. WK2124 needs to be read multiple times in each poll. And there's another interesting thing, if I add some printf and some log to the function, it will work just fine.

What is the actual behavior?

report memory free assert failed.

Steps to reproduce.

  1. Step
  2. Step
  3. Step ...

Debug Logs.

assert failed: tlsf_free heap_tlsf.c:872 (!block_is_free(block) && "block already marked as free")

Backtrace:0x40375d6a:0x3fc9d6500x4037c139:0x3fc9d670 0x40383155:0x3fc9d690 0x4038246f:0x3fc9d7b0 0x40382d86:0x3fc9d7d0 0x40375fca:0x3fc9d7f0 0x40383185:0x3fc9d810 0x40377882:0x3fc9d830 0x40377d08:0x3fc9d850 0x40377d55:0x3fc9d880 0x420095cf:0x3fc9d8a0 0x4200977e:0x3fc9d8d0 0x42009ccf:0x3fc9d920 0x42009588:0x3fc9d950 0x42007d8e:0x3fc9d970 0x42008db4:0x3fc9d990 0x42007dc8:0x3fc9d9d0 0x420083cd:0x3fc9d9f0 0x420078c8:0x3fc9da10 0x4037f265:0x3fc9db40 
0x40375d6a: panic_abort at /home/zhuxy/esp/esp-idf/components/esp_system/panic.c:402
0x4037c139: esp_system_abort at /home/zhuxy/esp/esp-idf/components/esp_system/esp_system.c:128
0x40383155: __assert_func at /home/zhuxy/esp/esp-idf/components/newlib/assert.c:85
0x4038246f: tlsf_free at /home/zhuxy/esp/esp-idf/components/heap/heap_tlsf.c:872 (discriminator 1)
0x40382d86: multi_heap_free_impl at /home/zhuxy/esp/esp-idf/components/heap/multi_heap.c:222
0x40375fca: heap_caps_free at /home/zhuxy/esp/esp-idf/components/heap/heap_caps.c:340
0x40383185: free at /home/zhuxy/esp/esp-idf/components/newlib/heap.c:39
0x40377882: uninstall_priv_desc at /home/zhuxy/esp/esp-idf/components/driver/spi_master.c:736
0x40377d08: spi_device_polling_end at /home/zhuxy/esp/esp-idf/components/driver/spi_master.c:983 (discriminator 15)
0x40377d55: spi_device_polling_transmit at /home/zhuxy/esp/esp-idf/components/driver/spi_master.c:1003
0x420095cf: wk_device_transmit at /home/zhuxy/esp-projects/IoT-Station/components/drivers/src/spi_wk2xxx.c:62
0x4200977e: wk_read_SFifo at /home/zhuxy/esp-projects/IoT-Station/components/drivers/src/spi_wk2xxx.c:195
0x42009ccf: wk_read_bytes at /home/zhuxy/esp-projects/IoT-Station/components/drivers/src/spi_wk2xxx.c:593
0x42009588: uart_port_read_bytes at /home/zhuxy/esp-projects/IoT-Station/components/drivers/src/uart_port.c:296
0x42007d8e: _modbus_rtu_recv at /home/zhuxy/esp-projects/IoT-Station/components/libmodbus/src/modbus-rtu.c:204
0x42008db4: _modbus_receive_msg at /home/zhuxy/esp-projects/IoT-Station/components/libmodbus/src/modbus.c:412
0x42007dc8: _modbus_rtu_receive at /home/zhuxy/esp-projects/IoT-Station/components/libmodbus/src/modbus-rtu.c:191
0x420083cd: modbus_receive at /home/zhuxy/esp-projects/IoT-Station/components/libmodbus/src/modbus.c:504
0x420078c8: modbus_rtu_task at /home/zhuxy/esp-projects/IoT-Station/main/src/modbus_slave.c:122
0x4037f265: vPortTaskWrapper at /home/zhuxy/esp/esp-idf/components/freertos/port/xtensa/port.c:131

ELF file SHA256: 4be6f973b94b35ce

CPU halted.

More Information.

if swap internal uart and external wk2124 uart in task, the same first read is normal, the second times are abort, the log is as follows: `Guru Meditation Error: Core 0 panic'ed (StoreProhibited). Exception was unhandled.

Core 0 register dump: PC : 0x40382734 PS : 0x00060833 A0 : 0x80382d89 A1 : 0x3fc9f8a0
0x40382734: insert_free_block at /home/zhuxy/esp/esp-idf/components/heap/heap_tlsf.c:243 (inlined by) block_insert at /home/zhuxy/esp/esp-idf/components/heap/heap_tlsf.c:269 (inlined by) tlsf_free at /home/zhuxy/esp/esp-idf/components/heap/heap_tlsf.c:876

A2 : 0x3fc9974c A3 : 0xa5a5a5a5 A4 : 0x3fc97a94 A5 : 0x00060823
A6 : 0x00060820 A7 : 0x00000001 A8 : 0x3fc9fad4 A9 : 0x00000017
A10 : 0x0000001f A11 : 0x3fc9fadc A12 : 0x3fc9fad4 A13 : 0x3fc9f9e8
A14 : 0x02ce0988 A15 : 0x00ffffff SAR : 0x00000018 EXCCAUSE: 0x0000001d
EXCVADDR: 0xa5a5a5b1 LBEG : 0x40056f5c LEND : 0x40056f72 LCOUNT : 0xffffffff

Backtrace:0x40382731:0x3fc9f8a00x40382d86:0x3fc9f8c0 0x40375fca:0x3fc9f8e0 0x40383185:0x3fc9f900 0x40377882:0x3fc9f920 0x40377d08:0x3fc9f940 0x40377d55:0x3fc9f970 0x420095cf:0x3fc9f990 0x4200977e:0x3fc9f9c0 0x42009ccf:0x3fc9fa10 0x42009588:0x3fc9fa40 0x42007d8e:0x3fc9fa60 0x42008db4:0x3fc9fa80 0x42008f49:0x3fc9fac0 0x42008ff1:0x3fc9fc00 0x420071f5:0x3fc9fc30 0x4037f265:0x3fc9fca0 0x40382731: insert_free_block at /home/zhuxy/esp/esp-idf/components/heap/heap_tlsf.c:242 (inlined by) block_insert at /home/zhuxy/esp/esp-idf/components/heap/heap_tlsf.c:269 (inlined by) tlsf_free at /home/zhuxy/esp/esp-idf/components/heap/heap_tlsf.c:876 0x40382d86: multi_heap_free_impl at /home/zhuxy/esp/esp-idf/components/heap/multi_heap.c:222 0x40375fca: heap_caps_free at /home/zhuxy/esp/esp-idf/components/heap/heap_caps.c:340 0x40383185: free at /home/zhuxy/esp/esp-idf/components/newlib/heap.c:39 0x40377882: uninstall_priv_desc at /home/zhuxy/esp/esp-idf/components/driver/spi_master.c:736 0x40377d08: spi_device_polling_end at /home/zhuxy/esp/esp-idf/components/driver/spi_master.c:983 (discriminator 15) 0x40377d55: spi_device_polling_transmit at /home/zhuxy/esp/esp-idf/components/driver/spi_master.c:1003 0x420095cf: wk_device_transmit at /home/zhuxy/esp-projects/IoT-Station/components/drivers/src/spi_wk2xxx.c:62 0x4200977e: wk_read_SFifo at /home/zhuxy/esp-projects/IoT-Station/components/drivers/src/spi_wk2xxx.c:195 0x42009ccf: wk_read_bytes at /home/zhuxy/esp-projects/IoT-Station/components/drivers/src/spi_wk2xxx.c:593 0x42009588: uart_port_read_bytes at /home/zhuxy/esp-projects/IoT-Station/components/drivers/src/uart_port.c:296 0x42007d8e: _modbus_rtu_recv at /home/zhuxy/esp-projects/IoT-Station/components/libmodbus/src/modbus-rtu.c:204 0x42008db4: _modbus_receive_msg at /home/zhuxy/esp-projects/IoT-Station/components/libmodbus/src/modbus.c:412 0x42008f49: read_registers at /home/zhuxy/esp-projects/IoT-Station/components/libmodbus/src/modbus.c:1202 0x42008ff1: modbus_read_registers at /home/zhuxy/esp-projects/IoT-Station/components/libmodbus/src/modbus.c:1243 0x420071f5: modbusTask at /home/zhuxy/esp-projects/IoT-Station/main/main.c:84 0x4037f265: vPortTaskWrapper at /home/zhuxy/esp/esp-idf/components/freertos/port/xtensa/port.c:131`

chipweinberger commented 1 year ago

I'd recommend you turn on comprehensive heap corruption detection in the config settings and try again.

This is likely corruption.

wishinlife commented 1 year ago

I'm manual malloc tx and rx buffer: buff = heap_caps_malloc(256, MALLOC_CAP_DMA | MALLOC_CAP_32BIT); It works fine. But I think it's not my code, it's the temporary DMA buffer malloc and free problem.

finger563 commented 7 months ago

I've been encountering this error as well (even on latest ESP-IDF master) and I believe the error occurs when uninstall_priv_desc is called as part of the clean_up after an ESP_ERR_TIMEOUT. I've added some logs to my code and I'm seeing that it's trying to free buffer_to_send even though it did not allocate it (since the buffer I'm providing is already DMA-capable and aligned).

finger563 commented 7 months ago

It's being freed incorrectly during timeout conditions by the spi_device_get_trans_result function.

Log: CleanShot 2023-12-05 at 09 49 34

Here is the patch I used to add some tracing / logging: spi_master_heap_corruption_trace.patch

Note: you can see I added a log any time the spi_master allocates memory for the tx or rx buffer, but the log never shows up; however when the get_trans_result times out (for some reason?) it attempts to free memory. You can also see in the patch that I commented out the call to free (instead replacing it with the log you see) and the code works just fine.

Edit: I'm calling the waiting code like this:

static void lcd_wait_lines() {
    spi_transaction_t *rtrans;
    esp_err_t ret;
    // fmt::print("Waiting for {} queued transactions\n", num_queued_trans);
    // Wait for all transactions to be done and get back the results.
    while (num_queued_trans) {
        ret = spi_device_get_trans_result(spi, &rtrans, 10 / portTICK_PERIOD_MS);
        if (ret != ESP_OK) {
            fmt::print("Could not get trans result: {} '{}'\n", ret, esp_err_to_name(ret));
        }
        num_queued_trans--;
        //We could inspect rtrans now if we received any info back. The LCD is treated as write-only, though.
    }
}

and the writing code like this:

extern "C" void lcd_send_lines(int xs, int ys, int xe, int ye, const uint8_t *data, uint32_t user_data) {
    // if we haven't waited by now, wait here...
    lcd_wait_lines();
    esp_err_t ret;
    size_t length = (xe-xs+1)*(ye-ys+1)*2;
    if (length == 0) {
        fmt::print("Bad length: ({},{}) to ({},{})\n", xs, ys, xe, ye);
    }
    // initialize the spi transactions
    for (int i=0; i<6; i++) {
        memset(&trans[i], 0, sizeof(spi_transaction_t));
        if ((i&1)==0) {
            //Even transfers are commands
            trans[i].length = 8;
            trans[i].user = (void*)0;
        } else {
            //Odd transfers are data
            trans[i].length = 8*4;
            trans[i].user = (void*)DC_LEVEL_BIT;
        }
        trans[i].flags = SPI_TRANS_USE_TXDATA;
    }
    trans[0].tx_data[0] = (uint8_t)DisplayDriver::Command::caset;
    trans[1].tx_data[0] = (xs)>> 8;
    trans[1].tx_data[1] = (xs)&0xff;
    trans[1].tx_data[2] = (xe)>>8;
    trans[1].tx_data[3] = (xe)&0xff;
    trans[2].tx_data[0] = (uint8_t)DisplayDriver::Command::raset;
    trans[3].tx_data[0] = (ys)>>8;
    trans[3].tx_data[1] = (ys)&0xff;
    trans[3].tx_data[2] = (ye)>>8;
    trans[3].tx_data[3] = (ye)&0xff;
    trans[4].tx_data[0] = (uint8_t)DisplayDriver::Command::ramwr;
    trans[5].tx_buffer = data;
    trans[5].length = length*8;
    // undo SPI_TRANS_USE_TXDATA flag
    trans[5].flags = SPI_TRANS_DMA_BUFFER_ALIGN_MANUAL;
    // we need to keep the dc bit set, but also add our flags
    trans[5].user = (void*)(DC_LEVEL_BIT | user_data);
    //Queue all transactions.
    for (int i=0; i<6; i++) {
        ret = spi_device_queue_trans(spi, &trans[i], 10 / portTICK_PERIOD_MS);
        if (ret != ESP_OK) {
            fmt::print("Couldn't queue trans: {} '{}'\n", ret, esp_err_to_name(ret));
        } else {
            num_queued_trans++;
        }
    }
    //When we are here, the SPI driver is busy (in the background) getting the
    //transactions sent. That happens mostly using DMA, so the CPU doesn't have
    //much to do here. We're not going to wait for the transaction to finish
    //because we may as well spend the time calculating the next line. When that
    //is done, we can call lcd_wait_lines, which will wait for the transfers
    //to be done and check their status.
}

Edit 2: just for completeness I've copied the logs below using the patch but still freeing the memory in uninstall_priv_desc which corrupts the heap as expected: CleanShot 2023-12-05 at 11 22 12