espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.78k stars 7.31k forks source link

Mbedtls Dynamic Port - Memory leak (IDFGH-13555) #14444

Open HikingDev opened 2 months ago

HikingDev commented 2 months ago

Answers checklist.

IDF version.

v5.2-dev-3903-g66992aca7a

Espressif SoC revision.

ESP32 (revision: v3.0)

Operating System used.

Windows

How did you build your project?

Command line with idf.py

If you are using Windows, please specify command line type.

PowerShell

Development Kit.

ESP32-Wrover-Kit

Power Supply used.

External 5V

What is the expected behavior?

Running an a https Webserver with dynamic rx/tx buffers and using a Client to send Data > 16KB to the ESP32 HTTPS Server without crashing with "CORRUPT HEAP" message

What is the actual behavior?

The firmware crashes withe a "CORRUPT HEAP" message

Steps to reproduce.

  1. flash firmware with esp32 webserver hosting a website
  2. Open Website hosted via ESP32 in Chrome browser

Debug Logs.

I (55063) Socket parser: [ #55 ] Setting up SSL/TLS data
I (55063) Socket parser: [ #55 ] Performing the SSL/TLS handshake
I (55143) Socket parser: [ #55 ] Handshake successful
W (55143) Socket parser: [initializeServerSSLSession][HEAP] Nr sockets: 6, Heap usage: 112804
I (55143) Socket parser: [initializeServerSSLSession] Releasing lock for clientFd '55'
I (55193) Socket parser: Recv Socket: socketFd: 51  size: 3800
I (55193) Socket parser: [recvSocket][LOCK] Attempting to acquire lock for socket '51'
I (55193) Socket parser: [recvSocket][LOCK] Lock acquired for socket '51'
E (55203) SSL TLS: failed read
CORRUPT HEAP: Bad tail at 0x3ffde6fd. Expected 0xbaad5678 got 0x00000000
Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.

Setting breakpoint at 0x4008d895 and returning...
0x4008d895: multi_heap_free at C:/SysGCC/esp32/13.2.0_12.1_idf5.2/esp-idf/v5.2/components/heap/multi_heap_poisoning.c:279

V (1541) esp_apptrace: esp_apptrace_flush_nolock(): enter

More Information.

Currently the esp mmbedtls dynamic implementation uses a function tx_buffer_len(...) in both

esp_mbedtls_add_tx_buffer and esp_mbedtls_add_rx_buffer to get the buffer length used in mbedtls_calloc.

static int tx_buffer_len(mbedtls_ssl_context *ssl, int len)
{
    (void)ssl;

    if (!len) {
        return MBEDTLS_SSL_OUT_BUFFER_LEN;
    } else {
        return len + MBEDTLS_SSL_HEADER_LEN
                   + MBEDTLS_MAX_IV_LENGTH
                   + MBEDTLS_SSL_MAC_ADD
                   + MBEDTLS_SSL_PADDING_ADD
                   + MBEDTLS_SSL_MAX_CID_EXPANSION;
    }
}

If Asymmetric in/out fragment length is chosen to reduce the size of buffers the RX buffer gets the MBEDTLS_SSL_OUT_BUFFER_LEN regardless of the config value in sdkonfig. If MBEDTLS_SSL_OUT_BUFFER_LEN is set to e.g. 4096 or if there is already data in the buffer, the RX buffer is always smaller than the by mbedtls required 16KB for the RX buffer. The documentation specifically states that the incoming buffer should be kept at 16KB, due to the fact that there is no supported way of informing the client about size restriction for incoming messagens. (See mbedtls Source mbedtls_config.h )

There were discussions and a pull request about the issues in mbedtls: dynamic ssl buffers Mandate of 16kb

Currently the espressif dynamic implementation disregards the mandatory 16kb rx buffer size. I suggest adding a rx_buffer_len function:

static int rx_buffer_len(mbedtls_ssl_context *ssl)
{
    (void)ssl;
    return MBEDTLS_SSL_IN_BUFFER_LEN;
}

and using it in esp_mbedtls_add_rx_buffer

AdityaHPatwardhan commented 1 month ago

Thank you for the issue @HikingDev, I agree that this behaviour needs to be fixed. Since you have already provide the solution do you want to raise a PR for this?

AdityaHPatwardhan commented 1 month ago

@HikingDev Is it possible to share the entire error log in this case?

Actually, I think the issue (and its solution) may not be as straightforward as it seems to be. As you see in the dynamic implementation the in_msglen is used to allocate the buffer dynamically. Typically the msglen is parsed before setting the rx_buffer len, The only case where this would cause a heap corruption is when the in_msglen = 0.

Can you please provide detailed log with log level verbose? I would particularly like to see the debug log before the heap corruption happens.

HikingDev commented 1 month ago

@AdityaHPatwardhan the issue is, that as documented in mbedtls, the client cannot be informed about the Maximum incoming fragment size.

/** \def MBEDTLS_SSL_IN_CONTENT_LEN
 *
 * Maximum length (in bytes) of incoming plaintext fragments.
 *
 * This determines the size of the incoming TLS I/O buffer in such a way
 * that it is capable of holding the specified amount of plaintext data,
 * regardless of the protection mechanism used.
 *
 * \note When using a value less than the default of 16KB on the client, it is
 *       recommended to use the Maximum Fragment Length (MFL) extension to
 *       inform the server about this limitation. On the server, there
 *       is no supported, standardized way of informing the client about
 *       restriction on the maximum size of incoming messages, and unless
 *       the limitation has been communicated by other means, it is recommended
 *       to only change the outgoing buffer size #MBEDTLS_SSL_OUT_CONTENT_LEN
 *       while keeping the default value of 16KB for the incoming buffer.
 *
 * Uncomment to set the maximum plaintext size of the incoming I/O buffer.
 */
//#define MBEDTLS_SSL_IN_CONTENT_LEN              16384

So indeed the in_msglen is used for dynamic buffer allocation but in case of a read (in a server application) the client cannot be informed that MBEDTLS_SSL_IN_CONTENT_LEN is lower than the required 16386 Bytes.

I cannot set mbedtls to verbose. Since its a heap corruption, the crash doesnt happen immediatley but in a later mbedtls_ssl_read and the verbose loglevel causes too much output, floods the terminal and causes a crash.

EDIT: I can try to make a simple server application, allowing to reproduce the faulty behavior.

For now we use a patched version, implementing the suggested pull request. This indeed fixes our crashes. I do wonder how many ppl out there use the dynamic port in a server application.

AdityaHPatwardhan commented 1 month ago

Ahh, I see. I only wanted more information in the in_msglen value in the server application. So I think this issue is more tied with the server. I will try to reproduce this at my end to get more information. So the main issue here is that directly creating a buffer of MBEDTLS_SSL_IN_CONTENT_LEN requires a lot more heap than just creating a buffer for thein_msglen this defeats the purpose of this feature that is to save the RAM. So Ideally we should only fix the issue for the heap corruption without affecting the heap usage numbers.

HikingDev commented 1 month ago

Indeed! The client doesn`t care about our buffer size. There was a discussion in the mbedtls repo: issue 1203

the buffer length of 16384 is the mandatory default for both sides per RFC6066. In an optional extension, per-connection, the client (only) may propose to negotiate it down to 512/1024/2048/4096 (only), and the server (only) may accept the proposal or reject it. The server may not make a counter-proposal. The existence of MBEDTLS_SSL_MAX_CONTENT_LEN itself already violates this, then.

For client connections, an api setting the buffer size can also imply the same 512/1024/2048/4096 restriction to the extension. Assuming the server allows it, that would be safe with both sides aware of the logical limit. RFC6066's extension provides no way for the server to do that for the client, but that's no worse than setting MBEDTLS_SSL_MAX_CONTENT_LEN globally today.

Also interesting read the pull request to add dynamic buffers pull request dynamic buffers mbedtls

Only for the in_buf we cannot save RAM. However, reducing it for the out_buf we can reduce memory usage.

AdityaHPatwardhan commented 1 month ago

Hi @HikingDev Sorry for the delayed reply. I created a HTTPS file server where I can upload files. I tried to send the 198 KB file and the server received the file in the chunks of 16KB. Even without applying any patch the transmission happened successfully as the msglen given by the client was always 16KB. The only case that I can think of when your server can corrupt the data is when it receives msglen as 0 and it sets up an rx_buffer of length 4096 ( MBEDTLS_SSL_OUT_BUFFER_LEN) and ends up corrupting memory when the actual msg is of length > 4096. I could not reproduce this issue at my end. but from my understanding I have created following patch. Can you please check if the patch fixes the issue for you?

mbedtls_dynamic_buffers_issue.patch

HikingDev commented 1 month ago

Hi @AdityaHPatwardhan , no worries! Unfortunately the patch does not fix our issue. The crash still happens in mbedtls_ssl_read. In our scenario we have a webserver. The crash only occurs with Chrome/Edge. Both open multiple sockets (at the time of the crash there are 6 open sockets. Heap still has 116kb of available memory.

AdityaHPatwardhan commented 1 month ago

Can you please provide a demo example where I can reproduce the issue at my end ?

AdityaHPatwardhan commented 1 month ago

@HikingDev The thing is that, we can't accept https://github.com/espressif/esp-idf/pull/14614 as it is as it drastically reduces the heap saved by the dynamic_buffers feature for mbedtls. If I am able to reproduce the issue at my end then I can probably try to fix the issue in a different way where the performance of the feature is not reduced. let me know!

HikingDev commented 1 month ago

@AdityaHPatwardhan i understand the issue. I will try to make sample app that allows to reproduce the bug. The testcase with the upload is probably not sufficient, cause it only opens a single socket (at least that's my assumption). Not sure if I manage in the next few weeks, but will try!

AdityaHPatwardhan commented 5 days ago

Hi @HikingDev any update on the reproducible code?