Closed AIWintermuteAI closed 10 months ago
@AIWintermuteAI thanks for the ESP32-S3 port. Let me try the example and get back to you.
@AIWintermuteAI I was able to re-produce the issue. Thankfully, I could figure out the reason for the corruption.
You will need to do aligned allocs to fix it.
Please apply the following change and let me know if this fixes it for you:
__attribute__((weak)) void *ei_malloc(size_t size) {
// return malloc(size);
return aligned_alloc(16, size);
}
__attribute__((weak)) void *ei_calloc(size_t nitems, size_t size) {
// return calloc(nitems, size);
return aligned_alloc(16, nitems * size);
}
result:
Left 11712 bytes after tensor_boundary
Timing: DSP 7 ms, inference 170 ms, anomaly 0 ms
Object detection bounding boxes:
face (0.996094) [ x: 32, y: 56, width: 8, height: 16 ]
BTW, I have enabled external RAM (SPI-RAM) from menuconfig to not fall short of memory requirements and set Interrupt WDT time to 1000ms (from 300)
@vikramdattu thanks for the prompt reply! I tested it and using alligned_alloc indeed works --- if scratch buffers are allocated outside of arena. By default scratch buffers are allocated in the head of arena - and in that case I'm getting wrong inference results, which was my original problem.
To give you a bit more context:
I'm attaching the project, where scratch buffers are allocated within arena, already with necessary changes made to porting layer (ei_calloc and ei_malloc). If you run it, you'd see that the output is
Edge Impulse standalone inferencing (Espressif ESP32)
Left 11712 bytes after tensor_boundary
Timing: DSP 5 ms, inference 151 ms, anomaly 0 ms
which is incorrect for the same sample. example-standalone-inferencing-espressif-esp32.zip
My suspicion is the same here: that somehow the optimized assembly kernel writing into area outside of allocated scratch buffer. If the scratch buffer is located inside of arena, that would corrupt neighboring structures.
Thanks for adding additional context to the issue. I shall reproduce the issue and fix it for good!
@AIWintermuteAI this was similar alignment issue. The scratch buffers allocated were not checking for aligned boundary.
Please modify AllocatePersistentBufferImpl
function to account for the alignment as below before returning the pointer:
}
current_location -= bytes;
// align to the left aligned boundary of 16 bytes
current_location -= 15; // for alignment
current_location += 16 - ((int) (current_location) & 15);
ptr = current_location;
memset(ptr, 0, bytes);
ei_printf("ARENA tensor_boundary %p current_location %p bytes %d \n", tensor_boundary, current_location, (int)bytes);
ei_printf("ARENA %p scratch_buf_size %d \n", ptr, (int)bytes);
return ptr;
}
You will also need to account for the extra buffer needed for this and
if (current_location - bytes < tensor_boundary) {
condition should change accordingly.
Hello! I tested the solution and it does work, thank you very much for the help.
I will need to think how exactly I'm going to incorporate it into our SDK, since AllocatePersistentBufferImpl is shared between all platforms and so far only ESP32-S3 required this specific alignment. Perhaps I'll make it a weak func and include the modified function into our porting layer. I'll mark issue as solved.
Hi, @vikramdattu ! Thank you for your amazing work, we integrated it into our Edge Impulse SDK, so more users can benefit from optimized NN inference with ESP NN.
Currently we only support ESP32 optimized kernels and I'm working on adding support for ESP32-S3 as well, since this chip is gaining popularity with users. When testing object detection compiled (interpreterless) model I'm running into CORRUPT HEAP error on freeing scratch buffers. It makes me think that there is possible heap corruption in write operation inside optimized Conv kernel, i.e. the kernel writing into area outside of allocated scratch buffer. I'm attaching the complete project here, here is the sample output I'm getting:
Let me know if you have the time to try reproducing the error or have any insights on how can I debug it myself.