Open XaviDCR92 opened 3 years ago
Hi @XaviDCR92, thanks for reporting this. Indeed reentrancy handing becomes complex when log messages are generated by the same subsystem to which log output is redirected. I think that removing log messages is not a viable solution, as that would practically mean removing the log messages from most components which could theoretically be involved in log redirection. For example, if a hypothetical application tries to redirect log messages to MQTT, then MQTT and all of its dependencies (esp-tls, esp_netif, lwip, Wi-Fi stack) must a) guard public APIs against reentrancy, and b) ensure that logging doesn't happen when output is not possible. The latter is particularly problematic since the component in question is not "aware" whether it is used for log redirection or not.
Piping the logs to a separate task via a queue or a RingBuffer indeed seems to be the most viable solution in this case.
Hi @igrr and thanks for your reply. I understand how complex this issue can become, so we adopted the solution you suggested. However, a FreeRTOS ring buffer or queue might become full at some point (since queue size is given at compile-time) if too many requests are made at the same time, so we moved towards a dynamic FIFO-based approach. As a disadvantage, it implies more calls to malloc
and friends (one per dynamic string and one per request), but ensures no information is lost as long there is memory available on the heap and only the strictly needed amount of memory is allocated (and freed once used).
Nevertheless, I do not know how well does the newlib
implementation of malloc
on the ESP32 handle memory fragmentation. If such thing occurred, then a dynamic queue that is allocated and freed very often might lead to memory exhaustion. Could you please provide details about this?
ESP-IDF uses a custom heap implementation which is a relatively simple linked list based allocator. In IDF v4.3 we are planning to replace it with TLSF, which will primarily improve performance of allocations when PSRAM is used. With the current design, fragmentation may occur if you request blocks of many different sizes. There will be less fragmentation if you round the blocks to multiples of certain size. If a typical log message fits into, say, 64 characters, then you can round the allocation size to multiples of 64. Most messages will fit into one block and some will require a few blocks.
That said, if logging is not the primary purpose of the application, placing some limit on the log queue size might actually be a good approach. This way, logs will never have a chance to consume so much memory that other more important parts of the application will not be able to allocate the memory they need.
What about using the RTC slow RAM as a circular buffer. That way 4KB (8KB by changing the linker script) of RAM becomes avialable to store the last messages, normally the ones indicating the events leading to the error.
We are using this approach, together with a replacement ESP_LOG module to ... a) store ALL messages in text format in the RTC RAM buffer; and b) sending non-DEBUG level messages directly using UDP to a syslog host (if possible)
@igrr As a further step (not yet implemented our side) the RTC buffer can be configured (I believe) to form part of the coredump to flash and subsequent upload to a central host?
Thanks for your replies.
In IDF v4.3 we are planning to replace it with TLSF
That would be very interesting. Is there a planned release date for esp-idf v4.3?
There will be less fragmentation if you round the blocks to multiples of certain size.
Our application depends on dynstr (a dynamic strings implementation with less features and type-safer than sds, which does not (and IMHO should not) hack with allocator-specific details like allocating multiple-size blocks to reduce fragmentation or reallocations, instead solely relying on a reasonable heap implementation.
That said, if logging is not the primary purpose of the application, placing some limit on the log queue size might actually be a good approach. This way, logs will never have a chance to consume so much memory that other more important parts of the application will not be able to allocate the memory they need.
While logging is not the primary purpose, we do want to forward log information to the file in the SD card and lose as few information as possible. While it is true logs might hypothetically take up too much memory, they are very short-lived and, if heap fragmentation is kept to a minimum, it is unlikely they leave no space for other tasks. Actually, it even might leave more heap space for other tasks when no requests are active, compared to using compile-time known sizes.
What about using the RTC slow RAM as a circular buffer.
Sounds like a similar idea to what @igrr proposed but without using up DRAM. I'll take a look to it.
good things
Environment
git describe --tags
to find it):v4.1-332-g7b7c64107
xtensa-esp32-elf-gcc --version
to find it):xtensa-esp32-elf-gcc (crosstool-NG esp-2020r3) 8.4.0
Problem Description
If
CONFIG_LOG_DEFAULT_LEVEL_VERBOSE
is configured, runningfread
on any thread causes a deadlock whenesp_log_set_vprintf
redirectsstdout
to a specific file. (for example,logs.txt
) in our SD card. This means giving a pointer to avprintf
-like function toesp_log_set_vprintf
.This is our
vprintf
-like function:This is the line triggering the problem, called by
fread
: https://github.com/espressif/esp-idf/blob/7b7c641076ef4c5dff7b994170d445bbb46f81ab/components/sdmmc/sdmmc_cmd.c#L32-L33When
fread
is executed from any function, a volume-wide semaphore is acquired by this function: https://github.com/espressif/esp-idf/blob/7b7c641076ef4c5dff7b994170d445bbb46f81ab/components/fatfs/src/ff.c#L3753However, the call to
ESP_LOGV
shown above callsvprintf_own
, which in turn callsfopen
, which also tries to take the acquire the volume-wide semaphore: https://github.com/espressif/esp-idf/blob/7b7c641076ef4c5dff7b994170d445bbb46f81ab/components/fatfs/src/ff.c#L3567Which causes a deadlock since
fopen
attempts to access the filesystem whilefread
still has not released the lock.Expected Behavior
fread
should return the expected value. The information on the call toESP_LOGV
should be written into the SD card, but it is not mandatory.Actual Behavior
fread
causes a deadlock and hangs the application.Steps to reproduce
vprintf
-like function viaesp_log_set_vprintf
that writes into a file.fread
.Code to reproduce this issue
https://gist.github.com/XaviDCR92/798e6b5d4cb6bf31f1fa3eac7a7a520d
Possible solutions
info
(unacceptable solution).fread
and friends.vfprintf
on the customvprintf
-like function and instead duplicate the data and send it to a separate thread via queue. Definitely not our preferred solution.