fread causes deadlock when vfprintf_own opens a file (IDFGH-4062)

XaviDCR92 commented 3 years ago

[x] I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
[x] I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
[x] I have searched the issue tracker for a similar issue and not found a similar issue.

Environment

Module or chip used: ESP32-WROVER-32E
IDF version (run git describe --tags to find it): v4.1-332-g7b7c64107
Build System: idf.py
Compiler version (run xtensa-esp32-elf-gcc --version to find it): xtensa-esp32-elf-gcc (crosstool-NG esp-2020r3) 8.4.0
Operating System: GNU/Linux
Using an IDE?: No
Power Supply: USB

Problem Description

If CONFIG_LOG_DEFAULT_LEVEL_VERBOSE is configured, running fread on any thread causes a deadlock when esp_log_set_vprintf redirects stdout to a specific file. (for example, logs.txt) in our SD card. This means giving a pointer to a vprintf-like function to esp_log_set_vprintf.

This is our vprintf-like function:

static int vprintf_own(const char *const fmt, const va_list va)
{
    esp_log_level_set("*", ESP_LOG_NONE);

    if (fs_init())
        return -1;

    FILE *const f = fopen("logs.txt", "ab");

    if (!f)
        return -1;

    const int a = vfprintf(f, fmt, va);
    fclose(f);
    esp_log_level_set("*", CONFIG_LOG_DEFAULT_LEVEL);

    return a;
}

This is the line triggering the problem, called by fread: https://github.com/espressif/esp-idf/blob/7b7c641076ef4c5dff7b994170d445bbb46f81ab/components/sdmmc/sdmmc_cmd.c#L32-L33

When fread is executed from any function, a volume-wide semaphore is acquired by this function: https://github.com/espressif/esp-idf/blob/7b7c641076ef4c5dff7b994170d445bbb46f81ab/components/fatfs/src/ff.c#L3753

However, the call to ESP_LOGV shown above calls vprintf_own, which in turn calls fopen, which also tries to take the acquire the volume-wide semaphore: https://github.com/espressif/esp-idf/blob/7b7c641076ef4c5dff7b994170d445bbb46f81ab/components/fatfs/src/ff.c#L3567

Which causes a deadlock since fopen attempts to access the filesystem while fread still has not released the lock.

Expected Behavior

fread should return the expected value. The information on the call to ESP_LOGV should be written into the SD card, but it is not mandatory.

Actual Behavior

fread causes a deadlock and hangs the application.

Steps to reproduce

Initialize SD card and filesystem.
Setup a custom vprintf-like function via esp_log_set_vprintf that writes into a file.
Open a file from any thread.
Try to read from the file via fread.

Code to reproduce this issue

https://gist.github.com/XaviDCR92/798e6b5d4cb6bf31f1fa3eac7a7a520d

Possible solutions

Decrease log level to info (unacceptable solution).
Remove log messages or avoid writing from critical functions that access the filesystem e.g.: fread and friends.
Avoid calling vfprintfon the custom vprintf-like function and instead duplicate the data and send it to a separate thread via queue. Definitely not our preferred solution.

igrr commented 3 years ago

Hi @XaviDCR92, thanks for reporting this. Indeed reentrancy handing becomes complex when log messages are generated by the same subsystem to which log output is redirected. I think that removing log messages is not a viable solution, as that would practically mean removing the log messages from most components which could theoretically be involved in log redirection. For example, if a hypothetical application tries to redirect log messages to MQTT, then MQTT and all of its dependencies (esp-tls, esp_netif, lwip, Wi-Fi stack) must a) guard public APIs against reentrancy, and b) ensure that logging doesn't happen when output is not possible. The latter is particularly problematic since the component in question is not "aware" whether it is used for log redirection or not.

Piping the logs to a separate task via a queue or a RingBuffer indeed seems to be the most viable solution in this case.

XaviDCR92 commented 3 years ago

Hi @igrr and thanks for your reply. I understand how complex this issue can become, so we adopted the solution you suggested. However, a FreeRTOS ring buffer or queue might become full at some point (since queue size is given at compile-time) if too many requests are made at the same time, so we moved towards a dynamic FIFO-based approach. As a disadvantage, it implies more calls to malloc and friends (one per dynamic string and one per request), but ensures no information is lost as long there is memory available on the heap and only the strictly needed amount of memory is allocated (and freed once used). Nevertheless, I do not know how well does the newlib implementation of malloc on the ESP32 handle memory fragmentation. If such thing occurred, then a dynamic queue that is allocated and freed very often might lead to memory exhaustion. Could you please provide details about this?

igrr commented 3 years ago

ESP-IDF uses a custom heap implementation which is a relatively simple linked list based allocator. In IDF v4.3 we are planning to replace it with TLSF, which will primarily improve performance of allocations when PSRAM is used. With the current design, fragmentation may occur if you request blocks of many different sizes. There will be less fragmentation if you round the blocks to multiples of certain size. If a typical log message fits into, say, 64 characters, then you can round the allocation size to multiples of 64. Most messages will fit into one block and some will require a few blocks.

That said, if logging is not the primary purpose of the application, placing some limit on the log queue size might actually be a good approach. This way, logs will never have a chance to consume so much memory that other more important parts of the application will not be able to allocate the memory they need.

ammaree commented 3 years ago

What about using the RTC slow RAM as a circular buffer. That way 4KB (8KB by changing the linker script) of RAM becomes avialable to store the last messages, normally the ones indicating the events leading to the error.

We are using this approach, together with a replacement ESP_LOG module to ... a) store ALL messages in text format in the RTC RAM buffer; and b) sending non-DEBUG level messages directly using UDP to a syslog host (if possible)

@igrr As a further step (not yet implemented our side) the RTC buffer can be configured (I believe) to form part of the coredump to flash and subsequent upload to a central host?

XaviDCR92 commented 3 years ago

Thanks for your replies.

In IDF v4.3 we are planning to replace it with TLSF

That would be very interesting. Is there a planned release date for esp-idf v4.3?

OTOH (a bit off-topic), AFAIK esp-idf limits static memory because of non-contiguous areas of memory. Has there been any progress in solving this limitation recently?

There will be less fragmentation if you round the blocks to multiples of certain size.

Our application depends on dynstr (a dynamic strings implementation with less features and type-safer than sds, which does not (and IMHO should not) hack with allocator-specific details like allocating multiple-size blocks to reduce fragmentation or reallocations, instead solely relying on a reasonable heap implementation.

That said, if logging is not the primary purpose of the application, placing some limit on the log queue size might actually be a good approach. This way, logs will never have a chance to consume so much memory that other more important parts of the application will not be able to allocate the memory they need.

While logging is not the primary purpose, we do want to forward log information to the file in the SD card and lose as few information as possible. While it is true logs might hypothetically take up too much memory, they are very short-lived and, if heap fragmentation is kept to a minimum, it is unlikely they leave no space for other tasks. Actually, it even might leave more heap space for other tasks when no requests are active, compared to using compile-time known sizes.

What about using the RTC slow RAM as a circular buffer.

Sounds like a similar idea to what @igrr proposed but without using up DRAM. I'll take a look to it.

rrrrrrr3 commented 1 year ago

good things

espressif / esp-idf