Open doothez opened 4 years ago
Hi @doothez
Thank you for bring up this topic. The proposed change makes sense to me, and you are right that our current implementation doesn't support such. I've forwarded the thread to the responsible party, and they may get back to you with details later.
Regards
To improve my above post a little bit on behalf of my colleagues...
To the original problem — "mbedtls_x509 module which needs to be called in OTA job is using so many local buffers which is more than 4k"... A possible workaround is to map mbedTLS malloc/free to platform malloc/free. And in FreeRTOS kernel, use heap_5.c to map external memory as the second bank of FreeRTOS heap.
xTaskCreate
allocates task TCB and task stack from FreeRTOS heap. So to call a function which uses a large trunk of task stack, could simply increase the task stack size. (I'm assuming external memory is large enough that some fragmentation is OK.)
With this, user cannot really predict where the task stack would be. Whether on-chip or off-chip, it'll be managed by memory allocation layer which is heap_5.c. (Unlike your solution, you have control for where the task stack is, which may be better for safety/isolation reason.)
So, in summary, even without any change to OTA, it seems similar functionality can be achieved. Though, I'd agree that user may want to have fine control on where the task stack/TCB memory is allocate from... I'll leave it to them to get back to you on this point.
Reference:
Let mbedTLS use static memory allocation. https://tls.mbed.org/kb/how-to/using-static-memory-instead-of-the-heap -- The full documentation of config.h -- https://tls.mbed.org/kb/compiling-and-building/how-do-i-configure-mbedtls, and Doxygen https://tls.mbed.org/api/config_8h.html -- The usage of above in this repo -- https://github.com/aws/amazon-freertos/blob/master/libraries/3rdparty/mbedtls_config/aws_mbedtls_config.h#L194-L204
Use FreeRTOS banked memory allocation scheme (heap_5). https://www.freertos.org/a00111.html
@yuhui-zheng , Thank you for your quick & kind support.
I think we are already using "CONFIG_MBEDTLS_EXTERNAL_MEM_ALLOC=y" option in sdkconfig, so dynamic allocation code is currently using external memory based on below routines.
/** Override calloc(), free() except for case where memory allocation scheme is not set to custom */
#ifndef CONFIG_MBEDTLS_CUSTOM_MEM_ALLOC
#include "esp_mem.h"
#define MBEDTLS_PLATFORM_STD_CALLOC esp_mbedtls_mem_calloc
#define MBEDTLS_PLATFORM_STD_FREE esp_mbedtls_mem_free
#endif
#ifndef CONFIG_MBEDTLS_CUSTOM_MEM_ALLOC
IRAM_ATTR void *esp_mbedtls_mem_calloc(size_t n, size_t size)
{
#ifdef CONFIG_MBEDTLS_INTERNAL_MEM_ALLOC
return heap_caps_calloc(n, size, MALLOC_CAP_INTERNAL|MALLOC_CAP_8BIT);
#elif CONFIG_MBEDTLS_EXTERNAL_MEM_ALLOC
return heap_caps_calloc(n, size, MALLOC_CAP_SPIRAM|MALLOC_CAP_8BIT);
#else
return calloc(n, size);
#endif
}
IRAM_ATTR void esp_mbedtls_mem_free(void *ptr)
{
return heap_caps_free(ptr);
}
#endif /* !CONFIG_MBEDTLS_CUSTOM_MEM_ALLOC */
But the problem is just like below routines, as you can see these kind of routines are cannot be configured with certain method. There are still local arrays just like below and I was mentioned those routines which are called by our routines.
int mbedtls_x509_string_to_names( mbedtls_asn1_named_data **head, const char *name )
{
int ret = 0;
const char *s = name, *c = s;
const char *end = s + strlen( s );
const char *oid = NULL;
const x509_attr_descriptor_t* attr_descr = NULL;
int in_tag = 1;
char data[MBEDTLS_X509_MAX_DN_NAME_SIZE];
char *d = data;
....
#if defined(MBEDTLS_PEM_WRITE_C)
int mbedtls_x509write_csr_pem( mbedtls_x509write_csr *ctx, unsigned char *buf, size_t size,
int (*f_rng)(void *, unsigned char *, size_t),
void *p_rng )
{
int ret;
unsigned char output_buf[4096];
size_t olen = 0;
if( ( ret = mbedtls_x509write_csr_der( ctx, output_buf, sizeof(output_buf),
f_rng, p_rng ) ) < 0 )
{
return( ret );
}
if( ( ret = mbedtls_pem_write_buffer( PEM_BEGIN_CSR, PEM_END_CSR,
output_buf + sizeof(output_buf) - ret,
ret, buf, size, &olen ) ) != 0 )
{
return( ret );
}
return( 0 );
}
#endif /* MBEDTLS_PEM_WRITE_C *
int mbedtls_x509write_csr_der( mbedtls_x509write_csr *ctx, unsigned char *buf, size_t size,
int (*f_rng)(void *, unsigned char *, size_t),
void *p_rng )
{
int ret;
const char *sig_oid;
size_t sig_oid_len = 0;
unsigned char *c, *c2;
unsigned char hash[64];
unsigned char sig[MBEDTLS_MPI_MAX_SIZE];
unsigned char tmp_buf[2048];
size_t pub_len = 0, sig_and_oid_len = 0, sig_len;
size_t len = 0;
mbedtls_pk_type_t pk_alg;
/*
...
#if !defined(MBEDTLS_MPI_MAX_SIZE)
/*
* Maximum size of MPIs allowed in bits and bytes for user-MPIs.
* ( Default: 512 bytes => 4096 bits, Maximum tested: 2048 bytes => 16384 bits )
*
* Note: Calculations can temporarily result in larger MPIs. So the number
* of limbs required (MBEDTLS_MPI_MAX_LIMBS) is higher.
*/
#define MBEDTLS_MPI_MAX_SIZE 1024 /**< Maximum number of bytes for usable MPIs. */
#endif /* !MBEDTLS_MPI_MAX_SIZE */
int mbedtls_x509write_crt_set_subject_key_identifier( mbedtls_x509write_cert *ctx )
{
int ret;
unsigned char buf[MBEDTLS_MPI_MAX_SIZE * 2 + 20]; /* tag, length + 2xMPI */
unsigned char *c = buf + sizeof(buf);
size_t len = 0
....
int mbedtls_x509write_crt_set_authority_key_identifier( mbedtls_x509write_cert *ctx )
{
int ret;
unsigned char buf[MBEDTLS_MPI_MAX_SIZE * 2 + 20]; /* tag, length + 2xMPI */
unsigned char *c = buf + sizeof( buf );
size_t len = 0;
....
int mbedtls_x509write_crt_der( mbedtls_x509write_cert *ctx, unsigned char *buf, size_t size,
int (*f_rng)(void *, unsigned char *, size_t),
void *p_rng )
{
int ret;
const char *sig_oid;
size_t sig_oid_len = 0;
unsigned char *c, *c2;
unsigned char hash[64];
unsigned char sig[MBEDTLS_MPI_MAX_SIZE];
unsigned char tmp_buf[2048];
size_t sub_len = 0, pub_len = 0, sig_and_oid_len = 0, sig_len;
size_t len = 0;
mbedtls_pk_type_t pk_alg;
....
Let me share our sdkconfig about mbedtls first, please advice me if there is something can be changed for getting more free internal memory.
# CONFIG_MBEDTLS_INTERNAL_MEM_ALLOC is not set
CONFIG_MBEDTLS_EXTERNAL_MEM_ALLOC=y
# CONFIG_MBEDTLS_DEFAULT_MEM_ALLOC is not set
# CONFIG_MBEDTLS_CUSTOM_MEM_ALLOC is not set
CONFIG_MBEDTLS_ASYMMETRIC_CONTENT_LEN=y
CONFIG_MBEDTLS_SSL_IN_CONTENT_LEN=8192
CONFIG_MBEDTLS_SSL_OUT_CONTENT_LEN=4096
# CONFIG_MBEDTLS_DEBUG is not set
CONFIG_MBEDTLS_ECP_RESTARTABLE=y
CONFIG_MBEDTLS_CMAC_C=y
CONFIG_MBEDTLS_HARDWARE_AES=y
# CONFIG_MBEDTLS_HARDWARE_MPI is not set
# CONFIG_MBEDTLS_HARDWARE_SHA is not set
CONFIG_MBEDTLS_HAVE_TIME=y
# CONFIG_MBEDTLS_HAVE_TIME_DATE is not set
CONFIG_MBEDTLS_TLS_SERVER_AND_CLIENT=y
# CONFIG_MBEDTLS_TLS_SERVER_ONLY is not set
# CONFIG_MBEDTLS_TLS_CLIENT_ONLY is not set
# CONFIG_MBEDTLS_TLS_DISABLED is not set
CONFIG_MBEDTLS_TLS_SERVER=y
CONFIG_MBEDTLS_TLS_CLIENT=y
CONFIG_MBEDTLS_TLS_ENABLED=y
# CONFIG_MBEDTLS_PSK_MODES is not set
CONFIG_MBEDTLS_KEY_EXCHANGE_RSA=y
CONFIG_MBEDTLS_KEY_EXCHANGE_DHE_RSA=y
CONFIG_MBEDTLS_KEY_EXCHANGE_ELLIPTIC_CURVE=y
CONFIG_MBEDTLS_KEY_EXCHANGE_ECDHE_RSA=y
CONFIG_MBEDTLS_KEY_EXCHANGE_ECDHE_ECDSA=y
CONFIG_MBEDTLS_KEY_EXCHANGE_ECDH_ECDSA=y
CONFIG_MBEDTLS_KEY_EXCHANGE_ECDH_RSA=y
CONFIG_MBEDTLS_SSL_RENEGOTIATION=y
# CONFIG_MBEDTLS_SSL_PROTO_SSL3 is not set
CONFIG_MBEDTLS_SSL_PROTO_TLS1=y
CONFIG_MBEDTLS_SSL_PROTO_TLS1_1=y
CONFIG_MBEDTLS_SSL_PROTO_TLS1_2=y
# CONFIG_MBEDTLS_SSL_PROTO_DTLS is not set
CONFIG_MBEDTLS_SSL_ALPN=y
CONFIG_MBEDTLS_SSL_SESSION_TICKETS=y
CONFIG_MBEDTLS_AES_C=y
# CONFIG_MBEDTLS_CAMELLIA_C is not set
# CONFIG_MBEDTLS_DES_C is not set
CONFIG_MBEDTLS_RC4_DISABLED=y
# CONFIG_MBEDTLS_RC4_ENABLED_NO_DEFAULT is not set
# CONFIG_MBEDTLS_RC4_ENABLED is not set
# CONFIG_MBEDTLS_BLOWFISH_C is not set
# CONFIG_MBEDTLS_XTEA_C is not set
CONFIG_MBEDTLS_CCM_C=y
CONFIG_MBEDTLS_GCM_C=y
# CONFIG_MBEDTLS_RIPEMD160_C is not set
CONFIG_MBEDTLS_PEM_PARSE_C=y
CONFIG_MBEDTLS_PEM_WRITE_C=y
CONFIG_MBEDTLS_X509_CRL_PARSE_C=y
CONFIG_MBEDTLS_X509_CSR_PARSE_C=y
CONFIG_MBEDTLS_ECP_C=y
CONFIG_MBEDTLS_ECDH_C=y
CONFIG_MBEDTLS_ECDSA_C=y
CONFIG_MBEDTLS_ECP_DP_SECP192R1_ENABLED=y
CONFIG_MBEDTLS_ECP_DP_SECP224R1_ENABLED=y
CONFIG_MBEDTLS_ECP_DP_SECP256R1_ENABLED=y
CONFIG_MBEDTLS_ECP_DP_SECP384R1_ENABLED=y
CONFIG_MBEDTLS_ECP_DP_SECP521R1_ENABLED=y
CONFIG_MBEDTLS_ECP_DP_SECP192K1_ENABLED=y
CONFIG_MBEDTLS_ECP_DP_SECP224K1_ENABLED=y
CONFIG_MBEDTLS_ECP_DP_SECP256K1_ENABLED=y
CONFIG_MBEDTLS_ECP_DP_BP256R1_ENABLED=y
CONFIG_MBEDTLS_ECP_DP_BP384R1_ENABLED=y
CONFIG_MBEDTLS_ECP_DP_BP512R1_ENABLED=y
CONFIG_MBEDTLS_ECP_DP_CURVE25519_ENABLED=y
CONFIG_MBEDTLS_ECP_NIST_OPTIM=y
@yuhui-zheng Are there any feedback about my above report ?
@doothez My colleague @pvyawaha mentioned he is involved in an email thread with folks from your side on the same topic.
I might have wrongly assumed, will forward this issue thread again. Apologies.
If you can upgrade x509write_crt.c, x509write_csr.c files to use dynamic buffer instead of huge local buffer, it will also be very helpful to me.
Are there any update ? Not only for using task stack with external memory, there are huge static arrays below to amazon-freertos which can be assigned to external memory by just adding "EXT_RAM_ATTR" keyword but not as of now. Can you please consider applying these also ?
diff --git a/libraries/freertos_plus/aws/ota/src/aws_iot_ota_agent.c b/libraries/freertos_plus/aws/ota/src/aws_iot_ota_agent.c
index d51f099d1..29438c80a 100644
--- a/libraries/freertos_plus/aws/ota/src/aws_iot_ota_agent.c
+++ b/libraries/freertos_plus/aws/ota/src/aws_iot_ota_agent.c
@@ -35,6 +35,7 @@
#include "task.h"
#include "queue.h"
#include "semphr.h"
+#include "esp_attr.h"
/* OTA agent includes. */
#include "aws_iot_ota_agent.h"
@@ -100,11 +101,23 @@ typedef union MultiParmPtr
/* Array containing pointer to the OTA event structures used to send events to the OTA task. */
-static OTA_EventMsg_t xQueueData[ OTA_NUM_MSG_Q_ENTRIES ];
+EXT_RAM_ATTR static OTA_EventMsg_t xQueueData[ OTA_NUM_MSG_Q_ENTRIES ];
/* Buffers used to push event data. */
-static OTA_EventData_t xEventBuffer[ otaconfigMAX_NUM_OTA_DATA_BUFFERS ];
+EXT_RAM_ATTR static OTA_EventData_t xEventBuffer[ otaconfigMAX_NUM_OTA_DATA_BUFFERS ];
/* OTA control interface. */
--- a/libraries/freertos_plus/standard/freertos_plus_tcp/source/portable/BufferManagement/BufferAllocation_2.c
+++ b/libraries/freertos_plus/standard/freertos_plus_tcp/source/portable/BufferManagement/BufferAllocation_2.c
@@ -82,7 +82,8 @@ to the system. All the network buffers referenced from xFreeBuffersList exist
in this array. The array is not accessed directly except during initialisation,
when the xFreeBuffersList is filled (as all the buffers are free when the system
is booted). */
-static NetworkBufferDescriptor_t xNetworkBufferDescriptors[ ipconfigNUM_NETWORK_BUFFER_DESCRIPTORS ];
+#include "esp_attr.h"
+EXT_RAM_ATTR static NetworkBufferDescriptor_t xNetworkBufferDescriptors[ ipconfigNUM_NETWORK_BUFFER_DESCRIPTORS ];
/* This constant is defined as false to let FreeRTOS_TCP_IP.c know that the
network buffers have a variable size: resizing may be necessary */
diff --git a/vendors/espressif/esp-idf/components/esp32/dbg_stubs.c b/vendors/espressif/esp-idf/components/esp32/dbg_stubs.c
index 51e9749b0..32475a1b8 100644
--- a/vendors/espressif/esp-idf/components/esp32/dbg_stubs.c
+++ b/vendors/espressif/esp-idf/components/esp32/dbg_stubs.c
@@ -49,8 +49,8 @@ static struct {
uint32_t data_free;
} s_dbg_stubs_ctl_data;
-static uint32_t s_stub_entry[ESP_DBG_STUB_ENTRY_MAX];
-static uint8_t s_stub_min_stack[ESP_DBG_STUBS_STACK_MIN_SIZE];
+EXT_RAM_ATTR static uint32_t s_stub_entry[ESP_DBG_STUB_ENTRY_MAX];
+EXT_RAM_ATTR static uint8_t s_stub_min_stack[ESP_DBG_STUBS_STACK_MIN_SIZE];
static DBG_STUB_TRAMP_ATTR uint8_t s_stub_code_buf[ESP_DBG_STUBS_CODE_BUF_SIZE];
// TODO: all called funcs should be in IRAM to work with disabled flash cache
diff --git a/vendors/espressif/esp-idf/components/nimble/esp-hci/src/esp_nimble_hci.c b/vendors/espressif/esp-idf/components/nimble/esp-hci/src/esp_nimble_hci.c
index 3c9c81790..c4ac97476 100644
--- a/vendors/espressif/esp-idf/components/nimble/esp-hci/src/esp_nimble_hci.c
+++ b/vendors/espressif/esp-idf/components/nimble/esp-hci/src/esp_nimble_hci.c
@@ -45,7 +45,7 @@ static struct os_mempool_ext ble_hci_acl_pool;
+ BLE_MBUF_MEMBLOCK_OVERHEAD \
+ BLE_HCI_DATA_HDR_SZ, OS_ALIGNMENT)
-static os_membuf_t ble_hci_acl_buf[
+EXT_RAM_ATTR static os_membuf_t ble_hci_acl_buf[
OS_MEMPOOL_SIZE(MYNEWT_VAL(BLE_ACL_BUF_COUNT),
ACL_BLOCK_SIZE)];
@@ -55,7 +55,7 @@ static os_membuf_t ble_hci_cmd_buf[
];
static struct os_mempool ble_hci_evt_hi_pool;
-static os_membuf_t ble_hci_evt_hi_buf[
+EXT_RAM_ATTR static os_membuf_t ble_hci_evt_hi_buf[
OS_MEMPOOL_SIZE(MYNEWT_VAL(BLE_HCI_EVT_HI_BUF_COUNT),
MYNEWT_VAL(BLE_HCI_EVT_BUF_SIZE))
];
diff --git a/vendors/espressif/esp-idf/components/nimble/nimble/porting/nimble/src/os_msys_init.c b/vendors/espressif/esp-idf/components/nimble/nimble/porting/nimble/src/os_msys_init.c
index 02c202623..c6e9c0366 100644
--- a/vendors/espressif/esp-idf/components/nimble/nimble/porting/nimble/src/os_msys_init.c
+++ b/vendors/espressif/esp-idf/components/nimble/nimble/porting/nimble/src/os_msys_init.c
@@ -20,6 +20,7 @@
#include <assert.h>
#include "os/os.h"
#include "mem/mem.h"
+#include "esp_attr.h"
#if MYNEWT_VAL(MSYS_1_BLOCK_COUNT) > 0
#define SYSINIT_MSYS_1_MEMBLOCK_SIZE \
@@ -27,7 +28,7 @@
#define SYSINIT_MSYS_1_MEMPOOL_SIZE \
OS_MEMPOOL_SIZE(MYNEWT_VAL(MSYS_1_BLOCK_COUNT), \
SYSINIT_MSYS_1_MEMBLOCK_SIZE)
-static os_membuf_t os_msys_init_1_data[SYSINIT_MSYS_1_MEMPOOL_SIZE];
+EXT_RAM_ATTR static os_membuf_t os_msys_init_1_data[SYSINIT_MSYS_1_MEMPOOL_SIZE];
static struct os_mbuf_pool os_msys_init_1_mbuf_pool;
static struct os_mempool os_msys_init_1_mempool;
#endif
@@ -38,7 +39,7 @@ static struct os_mempool os_msys_init_1_mempool;
#define SYSINIT_MSYS_2_MEMPOOL_SIZE \
OS_MEMPOOL_SIZE(MYNEWT_VAL(MSYS_2_BLOCK_COUNT), \
SYSINIT_MSYS_2_MEMBLOCK_SIZE)
-static os_membuf_t os_msys_init_2_data[SYSINIT_MSYS_2_MEMPOOL_SIZE];
+EXT_RAM_ATTR static os_membuf_t os_msys_init_2_data[SYSINIT_MSYS_2_MEMPOOL_SIZE];
static struct os_mbuf_pool os_msys_init_2_mbuf_pool;
static struct os_mempool os_msys_init_2_mempool;
#endif
@doothez
Following are few platform specific suggestions that might be helpful:
CONFIG_SPIRAM_MALLOC_ALWAYSINTERNAL
. And then enable config option CONFIG_SPIRAM_ALLOW_STACK_EXTERNAL_MEMORY
to allow to have stack placed in external SPIRAM. For more details on these configuration option please refer guideNimBLE
host stack (similar to mbedTLS
stack). Please refer to, https://github.com/espressif/esp-afr-sdk/blob/a3ce2f4007cff6c0ee38cf16f1c9e0103df671fe/components/bt/Kconfig#L1341. This will allow to redirect all dynamic allocations coming from NimBLE
host stack to external memory. (Pointers you shared above regarding big static buffers from NimBLE
host stack have already been converted to dynamic allocation sometimes back).dbg_stubs.c
file, you can entirely disable this feature by turning off CONFIG_ESP32_DEBUG_STUBS_ENABLE
(In-fact, this will get disabled if you choose compiler optimization level to release
mode, using CONFIG_OPTIMIZATION_LEVEL_RELEASE
)Hope this helps.
Mahavir
@mahavirj , Thanks for all the detailed steps and options.
@doothez Hope the steps and options provided by @mahavirj helped you to configure the task stack. Please let us know if you need any further assistance from us on this issue.
Is your feature request related to a problem? Please describe.
We are using ESP32 chipset and using 4MB external memory because our software needed many features which is not enough with current internal memory which ESP32 having. As time goes on, additional features comes in and lack of internal memory issues are happening. We've already changed all the possible tasks which we are creating to use external memory as stack. And we also optimized almost all the possible huge static variables to use EXT_RAM_ATTR keyword to allocate it to external memory. But recently CDF feature need to be added and we found that we need to increase OTA task stack like below.
That was because of mbedtls_x509 module which need to be called in OTA job is using so many local buffers which is more than 4k. (If mbedtls_x509 module can be configured or converted to use malloc() (external memory) instead of just local buffer that would be fantastic !)
And we still need more features to add to our ESP32 software in the future so we need a method to get or optimize free internal memory.
Describe the solution you would like.
This is one of example that we are currently using for configuring our tasks to use external memory as stack.
If amazon can provide similar way or method to configure the tasks under the amazon-freertos branch, we could utilize it for optimizing free internal memory. (As you know, we are just referencing amazon-freertos certain branch or TAG and cannot change routines in that branch based on our company's source code management rule.)
OTA task and Logging task could be candidate for it. If Amazon wants, we can provide more candidate task list.
Thank you!