Support smaller memory footprint in aws_test_runner

cyliangtw commented 5 years ago

For certification, it's mandatory to pass devicetester_afreertos. According to current certified 15 platforms, platform's SRAM size is 128 KB at least. Based on my measurement, Full_Shadow test would consume over 100 KB SRAM and trigger "vApplicationMallocFailedHook" callback. In other words, the chip under 100KB SRAM size will have no chance to pass the certification of device tester. So, is there any plan to shrink the memory footprint of aws_test_runner ?

aggarg commented 5 years ago

Thank you for bringing this to our notice. There are few things to check here:

Did you run the Full_Shadow tests and hit vApplicationMallocFailedHook or did it fail to build because of SRAM overflow? If later is the case, you can try reducing the heap size (configTOTAL_HEAP_SIZE in FreeRTOSConfig.h).
Does the shadow demo work for you?

In general, most of the RAM (~60KB) is consumed in the TLS handshake and therefore the RAM usage is much less on the platforms which offload TLS to a separate hardware (For example, STM32 Discovery Kit offloads it to the Inventek WiFi module).

We are also analyzing the heap uses in our libraries to ensure that we can tune them in the most optimal way.

Thanks.

cyliangtw commented 5 years ago

@aggarg , We plan to port some chips with 96 KB SRAM into AmazonFreeRTOS platform but device-tester need over 100 KB SRAM in my experience on M487. So, I really measured the memory usage on some test items. About RAM consumption of TLS, we applied ESP8266 for cost-effective, it could offload TCP/IP part but have no chance offload TLS part.

Yes, it really hit vApplicationMallocFailedHook in runtime as while I set configTOTAL_HEAP_SIZE as 56 KB on M487. As I know, Full_MQTT_Agent_Stress_Tests will consume much more SRAM usage than Full_Shadow.
shadow demo could work if I enlarge SRAM size over 115 KB.

Most system will execute TLS in MCU instead of wifi component for security concern. About peak of memory usage for device tester, Full_MQTT_Agent_Stress_Tests could be a good reference.

aggarg commented 5 years ago

Thank you providing this additional information - you are not able to run shadow demo on a platform with 96K SRAM. Just to confirm that when you try to run shadow demo, you hit vApplicationMallocFailedHook even when you set the configTOTAL_HEAP_SIZE to the maximum possible without hitting the SRAM overflow.

As I said, we are working on analyzing the heap usage to be able to tune them in the most optimal way.

Thanks.

cyliangtw commented 5 years ago

@aggarg , Yes, platform with 96 KB SRAM & without offload TLS would hit vApplicationMallocFailedHook in both of shadow demo vStartShadowDemoTasks and device-tester Full_Shadow. About Full_MQTT_Agent_Stress_Tests, it will require more 46 KB SRAM than Full_Shadow.

aggarg commented 5 years ago

Thanks for the information. I'll post once I have an update on this.

Thanks.

dcgaws commented 4 years ago

An option we're exploring is to create a test configuration that uses only Elliptic-Curve Cryptography (ECC). We can validate that approach in a few phases. Phase 1:

Configure the device (for example, start with the Simulator port) with an ECC private key and client certificate. No RSA.
Configure the device to only trust Amazon Root CA 3 in https://docs.aws.amazon.com/iot/latest/developerguide/server-authentication.html. If we trust any other root certificates, most of our ports will load them all into RAM, which is not what we want.
Modify the mbedTLS config.h to set MBEDTLS_SSL_MAX_CONTENT_LEN to 4096.
Modify the mbedTLS config.h to remove support for RSA and the RSA-based ciphersuites.
Connect to AWS IoT (i.e. an ATS endpoint) and exchange some MQTT packets.
Check the device heap usage and compare it to that of our default config.

Phase 2:

Figure out our largest outbound TLS packet size in the tests.
Set MBEDTLS_SSL_OUT_CONTENT_LEN according to the above (may have to be a power of two, not sure).
Figure out our largest inbound TLS packet size in the tests.
Similar, as above, for MBEDTLS_SSL_IN_CONTENT_LEN.
Re-run and check heap usage.

Phase 3: based on the outcome of the above, integrate the changes into Device Tester.

lundinc2 commented 4 years ago

Hello,

As a follow up to @dcgaws's instructions, I will post the impact these steps will have on the mbedTLS heap usage.

As a benchmark, the same MQTT connection was used without making changes to the TLS configuration.

This took a total heap of 61,980 bytes in order to complete a TLS handshake with AWS IoT Core. (The same usage that was expressed earlier in this thread).

After making the above changes, the largest inbound TLS packet I saw was about 2,124 bytes, and the largest outbound TLS packet was about 906 bytes.

Because of this I configured MBEDTLS_SSL_IN_CONTENT_LEN to 1000, and MBEDTLS_SSL_OUT_CONTENT_LEN to 2200.

After all of these changes, the heap impact of a TLS handshake to AWS IoT Core was reduced to 36,564 bytes. This showed the handshake heap usage reducing by 25,416 bytes or 41.00%.

This is quite significant and we will continue to investigate this solution.

lundinc2 commented 4 years ago

Please see #1468 for an example of reducing the mbedTLS footprint on the ESP32 platform.

mahavirj commented 4 years ago

After making the above changes, the largest inbound TLS packet I saw was about 2,124 bytes, and the largest outbound TLS packet was about 906 bytes.

Because of this I configured MBEDTLS_SSL_IN_CONTENT_LEN to 1000, and MBEDTLS_SSL_OUT_CONTENT_LEN to 2200.

@lundinc2 Will there be any issue if someone exercises OTA (over MQTT or HTTP) with larger OTA_FILE_BLOCK_SIZE?

dcgaws commented 4 years ago

@lundinc2 Will there be any issue if someone exercises OTA (over MQTT or HTTP) with larger OTA_FILE_BLOCK_SIZE?

That seems likely. I recall that for a 1 kB OTA block size (CBOR over MQTT) we were seeing about a 1200 byte gross payload.

htibosch commented 4 years ago

Normally, I am only involved in the lower-level libraries, and not the secure layers. But the above issue is about both layers.

I made an analysis of the memory usage of FreeRTOS+TCP in the demo applications. I looked at both the ESP32 and the PC project.

My recommendations are as follows:

/* The number of network buffers, the maximum number observed was 8.
Define it as 16, just to be sure. */
-#define ipconfigNUM_NETWORK_BUFFER_DESCRIPTORS    ( 60 )
+#define ipconfigNUM_NETWORK_BUFFER_DESCRIPTORS    ( 16 )

/* The number of segment descriptors ( for TCP sliding windows ).
The maximum will be 2. Define it as 16, just to be sure. */
-#define ipconfigTCP_WIN_SEG_COUNT                 ( 240 )
+#define ipconfigTCP_WIN_SEG_COUNT                 ( 16 )

/* As very few data is communicated, the TCP buffers can be kept small,
use 2 x the Maximum Segment Size, which evaluates to 2840 bytes. */
-#define ipconfigTCP_RX_BUFFER_LENGTH              ( 3000 )
+#define ipconfigTCP_RX_BUFFER_LENGTH              ( ipconfigTCP_MSS * 2 )

-#define ipconfigTCP_TX_BUFFER_LENGTH              ( 3000 )
+#define ipconfigTCP_TX_BUFFER_LENGTH              ( ipconfigTCP_MSS * 2 )

/* Be careful with this change, a smaller stack for the IP-task.
For me it worked well: */
-#define ipconfigIP_TASK_STACK_SIZE_WORDS          ( configMINIMAL_STACK_SIZE * 5 )
+#define ipconfigIP_TASK_STACK_SIZE_WORDS          ( configMINIMAL_STACK_SIZE * 3 )

I estimate that about 30 KB of RAM can be saved when using these settings, compared to the standard demo, both in static declarations as for the usage of the heap.

#define ipconfigNETWORK_MTU                        1460

One could also decrease the value of MTU, but that should be done with care. The Wi-Fi peripheral should be configured to refuse all packets larger than MTU. If not, the peripheral might access invalid memory space when it receives full-size packets.

Hein

cyliangtw commented 4 years ago

@htibosch , Except of #define ipconfigTCP_WIN_SEG_COUNT ( 16 ) and #define ipconfigIP_TASK_STACK_SIZE_WORDS ( configMINIMAL_STACK_SIZE * 3 ), other FreeRTOS+TCP configs of my platform are already equal or lower than yours. Thanks of your advisement, after apply these 2 settings, it could save 2 KB of SRAM on M487_ETH demo.

cyliangtw commented 4 years ago

@dcgaws , @lundinc2 , Follow up your instructions (phase 1~3), I adjust MBEDTLS_SSL_MAX_CONTENT_LEN , it's default value is 8192, if I set 4096 or 5120, test-runner MQTT_Unit & MQTT_System will suffer:

[TestRunner] ERROR: Handshake failed with error code -29184
ERROR: [ESP_IO_Recv] Get ipd 2430 bytes reach the maximum size !!

It seems MBEDTLS_SSL_IN_CONTENT_LEN must be over 5120 in this test item. Thus, I enlarge MBEDTLS_SSL_MAX_CONTENT_LEN= 6144 and MBEDTLS_SSL_OUT_CONTENT_LEN=2048, it could work well and save 16 KB SRAM in the same test-runner item.

aws / amazon-freertos

Support smaller memory footprint in aws_test_runner #1250