espressif / esp-nn

Optimised Neural Network functions for Espressif chipsets
Apache License 2.0
125 stars 22 forks source link

Out of bounds memory write in esp_nn_aligned_s8_pad_with_value (causing CORRUPT HEAP: Bad tail) #8

Open AIWintermuteAI opened 8 months ago

AIWintermuteAI commented 8 months ago

Hello, @vikramdattu ! We have successfully tested object detection and sensor time series models with ESP32-S3. However, when testing keyword spotting model, we ran into CORRUPT HEAP: Bad tail issue. I started investigating, think that it might be the alignment problem again, described in https://github.com/espressif/esp-nn/issues/7, however it seems this is something different this time. I peppered my inference and ESP NN code with heap_caps_check_integrity_all(true); and managed to find the source of the problem (I think): The tail corruption is happening here https://github.com/espressif/esp-nn/blob/12129cf04b09af0023127ca7551dc1a363344211/src/common/common_functions.h#L188 which is led from https://github.com/espressif/esp-nn/blob/12129cf04b09af0023127ca7551dc1a363344211/src/convolution/esp_nn_conv_esp32s3.c#L257 The padding function is only used for ESP32-S3. I think the reason it only shows for keyword spotting models for us is that the dimensions are different and therefore there is a need for padding.

From what I understand the code here resizes scratch buffer? I don't think this is valid, even if it is within the arena (and for sure not valid when scratch buffer is allocated outside of the arena, like in our case). If the scratch buffer is inside of the arena, resizing it like that would likely cause corruption to neighboring buffers/tensors. If it is outside of arena, then there is no way to tell what will it write on.

Let me know if my reasoning here is correct and how can we solve this issue! Enclosed find the exact project to replicate the issue. example-standalone-inferencing-espressif-esp32.zip

AIWintermuteAI commented 8 months ago

As an afterthought, it's likely this can be fixed by calculating correct size of scratch buffer beforehand.

vikramdattu commented 8 months ago

Hello @AIWintermuteAI thanks for reporting yet another issue.

Please find the patch attached and let me know if this fixes the issue. memory_write_overflow.patch

AIWintermuteAI commented 8 months ago

Hi, @vikramdattu ! Thank you for the fast reply and help in resolution of the problem. I tested the patch and can confirm it resolves the problem. As soon you make the changes in the main branch, we'll pull them in (ESP32-S3 support was added to Edge Impulse yesterday).

vikramdattu commented 8 months ago

Hi @AIWintermuteAI the fix has been pushed to the esp-nn repo.