espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.71k stars 7.3k forks source link

Storing ESP_LOG into SPIFFS with esp_log_set_vprintf would cause other tasks to overflow (IDFGH-12149) #13205

Open law-ko opened 8 months ago

law-ko commented 8 months ago

Answers checklist.

IDF version.

v5.1.1-577-g6b1f40b9bf-dirty

Espressif SoC revision.

ESP32

Operating System used.

Linux

How did you build your project?

Command line with idf.py

If you are using Windows, please specify command line type.

None

Development Kit.

ESP32-PICO-MINI-02

Power Supply used.

USB

What is the expected behavior?

I expect all tasks to not overflow and work as it would like before adding the logging to spiffs.

What is the actual behavior?

Other tasks overflowed.

Steps to reproduce.

The code for initializing and start logging into SPIFFS:

// This function will be called by the ESP log library every time ESP_LOG needs to be performed.
//      @CRITICAL Do NOT use the ESP_LOG* macro's in this function ELSE recursive loop and stack overflow! So use here printf() instead for logging.
int spiffs_log_vprintf(const char *fmt, va_list args) 
{
    static const uint32_t WRITE_CACHE_CYCLE = 100;
    static uint32_t counter = 0;

    // Write to SPIFFS
    if (log_file == NULL) {
        printf("ABORT. file handle log_file is NULL\n");
        return -1;
    }

    int iresult = vfprintf(log_file, fmt, args);
    if (iresult < 0) {
        printf("ABORT. failed vfprintf() err %i\n", iresult);
        return iresult;
    }

    // Smart commit after x writes
    counter++;
    if (counter % WRITE_CACHE_CYCLE == 0) {
        /////printf("fsync'ing log file on SPIFFS (WRITE_CACHE_PERIOD=%u)\n", WRITE_CACHE_CYCLE);
        fsync(fileno(log_file));
    }

    // ALSO Write to stdout!
    return vprintf(fmt, args);
}

esp_err_t log_spiffs_init()
{
    ESP_LOGI(TAG, "Initializing SPIFFS for logging");

    esp_vfs_spiffs_conf_t conf = {
        .base_path = "/log",
        .partition_label = "log",
        .max_files = 5,
        .format_if_mount_failed = true
    };

    // Use settings defined above to initialize and mount SPIFFS filesystem.
    // Note: esp_vfs_spiffs_register is an all-in-one convenience function.
    esp_err_t ret = esp_vfs_spiffs_register(&conf);

    if (ret != ESP_OK) {
        if (ret == ESP_FAIL) {
            ESP_LOGE(TAG, "Failed to mount or format filesystem");
        } else if (ret == ESP_ERR_NOT_FOUND) {
            ESP_LOGE(TAG, "Failed to find SPIFFS partition");
        } else {
            ESP_LOGE(TAG, "Failed to initialize SPIFFS (%s)", esp_err_to_name(ret));
        }
        return ret;
    }

    size_t total = 0, used = 0;
    ret = esp_spiffs_info(conf.partition_label, &total, &used);
    if (ret != ESP_OK) {
        ESP_LOGE(TAG, "Failed to get SPIFFS partition information (%s). Formatting...", esp_err_to_name(ret));
        esp_spiffs_format(conf.partition_label);
        return ret;
    } else {
        ESP_LOGI(TAG, "Partition size: total: %d, used: %d", total, used);
    }

    // Check consistency of reported partiton size info.
    if (used > total) {
        ESP_LOGW(TAG, "Number of used bytes cannot be larger than total. Performing SPIFFS_check().");
        ret = esp_spiffs_check(conf.partition_label);
        // Could be also used to mend broken files, to clean unreferenced pages, etc.
        // More info at https://github.com/pellepl/spiffs/wiki/FAQ#powerlosses-contd-when-should-i-run-spiffs_check
        if (ret != ESP_OK) {
            ESP_LOGE(TAG, "SPIFFS_check() failed (%s)", esp_err_to_name(ret));
            return ret;
        } else {
            ESP_LOGI(TAG, "SPIFFS_check() successful");
        }
    }

    // Open log file
    log_file = fopen(LOG_PATH, "at"); // a=append t=textmode
    if (log_file == NULL) {
        ESP_LOGE(TAG, "Failed to open log file");
        return ESP_FAIL;
    }

    // Redirect ESP_LOG
    ESP_LOGI(TAG, "Redirecting ESP_LOG to SPIFFS log file (also keep sending logs to UART0)");
    esp_log_set_vprintf(&spiffs_log_vprintf);

    // Put BEGIN marker in the logs
    ESP_LOGI(TAG, "##########BEGIN##########");

    // Mark initialized
    log_initialized = true;

    return ESP_OK;
}

Added log_spiffs_init() in app_main.c after nvs_flash_init()

Debug Logs.

***ERROR*** A stack overflow in task XXX has been detected.

Backtrace: 0x40081cb6:0x3ffd6250 0x400944c1:0x3ffd6270 0x40094c1a:0x3ffd6290 0x400953ae:0x3ffd6310 0x400945f8:0x3ffd6340 0x400945aa:0x00000000 |<-CORRUPTED
0x40081cb6: panic_abort at /home/espuser/esp/esp-idf/components/esp_system/panic.c:452

0x400944c1: esp_system_abort at /home/espuser/esp/esp-idf/components/esp_system/port/esp_system_chip.c:84

0x40094c1a: vApplicationStackOverflowHook at /home/espuser/esp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:581

0x400953ae: vTaskSwitchContext at /home/espuser/esp/esp-idf/components/freertos/FreeRTOS-Kernel/tasks.c:3728

0x400945f8: _frxt_dispatch at /home/espuser/esp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/portasm.S:450

0x400945aa: _frxt_int_exit at /home/espuser/esp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/portasm.S:245


### More Information.

We are trying to upload the ESP_LOG logging information to AWS IoT. Our approach is to first store logs into SPIFFS and send to AWS IoT at some point. 

Following this [guide for sdcard](https://gist.github.com/igrr/82055d824cbbc521932f32217e0710b9) and this [spiffs logging example](https://github.com/pantaluna/esp32-mjd-starter-kit/blob/master/Projects/esp32_spiffs_basics/main/spiffs_basics_main.c#L40).
igrr commented 8 months ago

There are two issues here:

  1. Writing a log line into SPIFFS involves a lot of deeply nested code: filesystem, flash driver, etc. Therefore all the tasks will require more stack space compared to when logging goes over to UART. As an alternative, you can send all the log entries to a ringbuffer, and use a lower priority task to flush the ringbuffer to a file once in a while.
  2. Your spiffs_log_vprintf doesn't guard against re-entry, which may happen for example if SPIFFS or esp_flash layers call ESP_LOGE while handling an attempt to write to the log file. (This is not happening in your case yet, but is an issue still.)
law-ko commented 8 months ago

2. Your spiffs_log_vprintf doesn't guard against re-entry, which may happen for example if SPIFFS or esp_flash layers call ESP_LOGE while handling an attempt to write to the log file. (This is not happening in your case yet, but is an issue still.)

@igrr Thank you for your insight, and how to guard against re-entry?

law-ko commented 8 months ago

@igrr We also realized that the output from ESP_LOG is not UTF-8 encoded, as a limitation of our cloud services, it needs to be UTF-8 encoded. Is there any way to do so?

We tried to remove the non UTF-8 characters, but seems like a lot of information is missing.

For example:

read version: %d\u001b[0m\n %-

But it should be:

read version: 0

The %d is not replaced with the actual value that was added in the ESP_LOG args. However, the logs printed out in console is correct.

igrr commented 8 months ago

and how to guard against re-entry?

Using a thread-local variable is one of the options:


int custom_vprintf(const char *format, va_list args)
{
    static __thread bool logging = false;
    if (logging) {
        // We have re-entered custom_vprintf. Skip writing to a file.
        return vprintf(format, args);
    } else {
        logging = true;
        // save the log line to a file
        ...
        logging = false;
        return vprintf(format, args);   // also log to UART, or alternatively just return the number of characters written.
    }
}

... but it depends on what you want to achieve. If you want multiple tasks to have a mutually exclusive access to the log, then you can use a FreeRTOS mutex instead of the thread-local flag, and then use xSemaphoreTake(..., 0) to try taking the semaphore without blocking.

Regarding the issue, I'm sorry, I don't understand the issue. I understand each sentence you wrote separately, but they are not coming together to form a single picture. Could you please explain the issue in a more detailed way? Maybe if you can attache a piece of code which demonstrates the issue it will be better.

law-ko commented 8 months ago

@igrr , here's an example of what I mean by it: What is seen on the txt file:

================================

HEAP in DRAM: 139688

====== Heap Trace: 3 records (100 capacity) ======

    16 bytes (@ 0x3ffc10fc, Internal) allocated CPU 0 ccount 0x4b316714 caller 0x40152807:0x401528b6

0x40152807: mem_malloc at /home/espuser/esp/esp-idf/components/lwip/lwip/src/core/mem.c:209
0x401528b6: do_memp_malloc_pool at /home/espuser/esp/esp-idf/components/lwip/lwip/src/core/memp.c:254

    16 bytes (@ 0x3ffc1110, Internal) allocated CPU 0 ccount 0x4b3189cc caller 0x40152807:0x401528b6

0x40152807: mem_malloc at /home/espuser/esp/esp-idf/components/lwip/lwip/src/core/mem.c:209
0x401528b6: do_memp_malloc_pool at /home/espuser/esp/esp-idf/components/lwip/lwip/src/core/memp.c:254

    16 bytes (@ 0x3ffc1d8c, Internal) allocated CPU 0 ccount 0x4b31c008 caller 0x40152807:0x401528b6

0x40152807: mem_malloc at /home/espuser/esp/esp-idf/components/lwip/lwip/src/core/mem.c:209
0x401528b6: do_memp_malloc_pool at /home/espuser/esp/esp-idf/components/lwip/lwip/src/core/memp.c:254

====== Heap Trace Summary ======

What is seen on the terminal output:

================================
HEAP in DRAM: 139688
====== Heap Trace: 3 records (100 capacity) ======
    16 bytes (@ 0x3ffc10fc, Internal) allocated CPU 0 ccount 0x724300f4 caller 0x40152807:0x401528b6
0x40152807: mem_malloc at /home/espuser/esp/esp-idf/components/lwip/lwip/src/core/mem.c:209
0x401528b6: do_memp_malloc_pool at /home/espuser/esp/esp-idf/components/lwip/lwip/src/core/memp.c:254

    16 bytes (@ 0x3ffc1110, Internal) allocated CPU 0 ccount 0x724328b4 caller 0x40152807:0x401528b6
0x40152807: mem_malloc at /home/espuser/esp/esp-idf/components/lwip/lwip/src/core/mem.c:209
0x401528b6: do_memp_malloc_pool at /home/espuser/esp/esp-idf/components/lwip/lwip/src/core/memp.c:254

    16 bytes (@ 0x3ffc1d8c, Internal) allocated CPU 0 ccount 0x7243cee4 caller 0x40152807:0x401528b6
0x40152807: mem_malloc at /home/espuser/esp/esp-idf/components/lwip/lwip/src/core/mem.c:209
0x401528b6: do_memp_malloc_pool at /home/espuser/esp/esp-idf/components/lwip/lwip/src/core/memp.c:254

====== Heap Trace Summary ======
image
igrr commented 8 months ago

Those additional characters are ANSI color codes. They make the text you see in the terminal yellow. You can disable them in menuconfig (see CONFIG_LOG_COLORS).

Alternatively, you can post-process the formatted string to remove ANSI escape sequences. There are a bunch of implementations of that online, I hope you can find how to do this.

adokitkat commented 8 months ago

Hello. Did you resolve your issue?

law-ko commented 8 months ago

@adokitkat No I have not, apparently there are more characters that are non UTF-8 that needs to be manually removed.

adokitkat commented 8 months ago

UTF-8 should be backwards compatible with all of ASCII though. Are you sure this is the real issue here? Maybe there are still some (non-escaped?) ANSI escape codes? Read the output with hex editor to see what characters you have there.

law-ko commented 8 months ago

@igrr Another issue we had is that the %d %s will still remain as is and not replaced by the provided value by ESP_LOG:

xRingbufferSend() OK: I (%lu) %s: Ack packet deserialized with result: %s.

where it should be

I (10961) coreMQTT: Ack packet deserialized with result: MQTTSuccess.

Here is the vprintf_like_t function:

int ringbuffer_log_vprintf(const char *fmt, va_list args)
{
    // Write to ring buffer
    UBaseType_t res = xRingbufferSend(buf_handle, fmt, strlen(fmt), pdMS_TO_TICKS(1000));
    if (res != pdTRUE) {
        printf("ABORT. xRingbufferSend() failed\n");
        return -1;
    } else {
        printf("xRingbufferSend() OK: %s\n", fmt);
    }

    // ALSO Write to stdout!
    return vprintf(fmt, args);
}
Superberti commented 5 months ago

@law-ko: You're sending the pure format string to the ringbuffer. So no wonder that it doesn't work. First you have to evaluate the print function:

int ringbuffer_log_vprintf(const char *format, va_list paramList)
{
  va_list copy;
  va_copy(copy, paramList);
  int size = vsnprintf(NULL, 0, format, copy);
  va_end(copy);
  if (!size)
  {
    // your error handling here
    return 0;
  }
  char *ar = malloc(size + 2);
  va_copy(copy, paramList);
  size = vsnprintf(ar, size + 1, format, paramList);
  va_end(copy);
  // copy ar to the ringbuffer
  [...]
  free(ar);
  // write to stdout
  [...]
  return size;
}