Open aksdfauytv opened 1 year ago
I am using ESP32 Rev. 1 and I have now found a piece of code that can reliably reproduce an inexplainable wrong byte in external SPIRAM. To confirm my assumption I executed the exact same bytecode on an ESP32 Rev. 3.0 and the error disappeared.
So I can confirm that ESP-IDF commit dc016f59877d13e6e7d4fc193aa5aa764547f16d definitely contains a severe bug regarding the workarounds for errata 3.9 and / or 3.10.
My compiler version is: xtensa-esp32-elf-gcc-12.2.0.exe (crosstool-NG esp-12.2.0_20230208) 12.2.0
@igrr Could you tell us please: Does espressif intend to fix this bug or does espressif no longer support the ESP32 Rev. 1 ?
@zinke-ct-video if you have a reproducer code you could share, we will investigate the issue and try to find what causes the incorrect code to be generated.
(The code provided by the original issue reporter didn't help much, unfortunately, because it can't be compiled without the rest of the application.)
Hi @igrr, you should have said you need a better example. Here is a full code to reproduce the issue:
build with xtensa-esp32-elf-gcc test.c -c -mlongcalls -mfix-esp32-psram-cache-issue -mfix-esp32-psram-cache-strategy=memw -ggdb -O2 -std=gnu99 -ffunction-sections -fdata-sections -fstrict-volatile-bitfields -fstack-protector
look for break 1, 15
in the disassembly, the faulty code is just before it
typedef struct
{
int length;
char *json;
} JSONEvent;
typedef struct
{
JSONEvent event;
unsigned short eventsCount;
unsigned long long packageStartTime;
} LogWriterPackage;
typedef struct
{
char pad[92];
LogWriterPackage *currentPackage;
} self_t;
typedef struct
{
int length;
} input_t;
void PlatformLogError(const char *fmt, ...);
void *PlatformAllocateChunk(int size);
int LogWriterAddPackage(self_t* self, LogWriterPackage* package);
int esp_get_free_heap_size();
unsigned long long PlatformGetRelativeTimeMS();
void free(void *p);
#define LOGTAG "bla"
#define MIN_FREE_HEAP 1048576
#define LOGWRITER_MAX_ELEMENTS 100
#define PACKAGE_MAX_SIZE 10000
#define PACKAGE_MAX_AGE 1000
#define MAX(a, b) ((a) > (b) ? (a) : (b))
#define NULL 0
void bla(self_t *self, input_t *input, const JSONEvent* event)
{
if(self->currentPackage == NULL || (self->currentPackage->event.length + (event->length + 2)) >= PACKAGE_MAX_SIZE)
{
if(self->currentPackage != NULL) {
LogWriterAddPackage(self, self->currentPackage);
self->currentPackage = NULL;
}
if(esp_get_free_heap_size() < MIN_FREE_HEAP) {
PlatformLogError(LOGTAG, "Not enough memory to allocate new package");
return;
}
//@TODO: Check free heap space and break loop
self->currentPackage = (LogWriterPackage*) PlatformAllocateChunk(sizeof(LogWriterPackage));
if(self->currentPackage == NULL) {
PlatformLogError(LOGTAG, "Allocating package failed");
return;
}
self->currentPackage->event.json = PlatformAllocateChunk(MAX(PACKAGE_MAX_SIZE, (event->length + 2)));
char *tmp = self->currentPackage->event.json;
//__asm volatile("" ::: "a5"); // prevent tmp from being placed in a5, so we can read the value of a5 and make sure it was not changed by some other means. If you remove this line, the bug still happens, you just can't see where a10 was written to (see disassembly).
if((int)tmp < 0) // check if tmp is 0xcececece
__builtin_trap();
if(tmp == NULL) {
PlatformLogError(LOGTAG, "Allocating package data failed");
free(self->currentPackage);
self->currentPackage = NULL;
return;
}
self->currentPackage->eventsCount = 0;
self->currentPackage->event.length = 0;
self->currentPackage->packageStartTime = PlatformGetRelativeTimeMS();
}
}
I tried to put my error reproducing code into a separate folder to share it with you. Unfortunately it is enough to change the paths to make the error disappear. Comparing the compiled binary code I can see that the folder paths contained in the binary have changed and I assume that this change causes the error not to appear.
@igrr Do you have any idea how I could make my project "portable" so I can share it with you for investigation?
You can try to enable the reproducible builds option in menuconfig (see https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-guides/reproducible-builds.html#enabling-reproducible-builds-in-esp-idf), this should exclude all paths which may be added by IDF code. However if you use __FILE__
directly in your code, it will still introduce some dependency on the binary.
Very good hint. This has helped. So here is my example project. This is the minimum code for which I could make sure that the error appears: example.zip
For faster orientation in my code:
Additional hints:
Log Output:
I (30) boot: ESP-IDF HEAD-HASH-NOTFOUND 2nd stage bootloader
I (30) boot: Multicore bootloader
I (31) boot: chip revision: v1.0
I (31) qio_mode: Enabling default flash chip QIO
I (31) boot.esp32: SPI Speed : 80MHz
I (31) boot.esp32: SPI Mode : QIO
I (31) boot.esp32: SPI Flash Size : 16MB
I (32) boot: Enabling RNG early entropy source...
I (32) boot: Partition Table:
I (32) boot: ## Label Usage Type ST Offset Length
I (33) boot: 0 nvs WiFi data 01 02 00009000 00004000
I (33) boot: 1 otadata OTA data 01 00 0000d000 00002000
I (33) boot: 2 phy_init RF data 01 01 0000f000 00001000
I (34) boot: 3 ota_0 OTA app 00 10 00010000 00400000
I (34) boot: 4 ota_1 OTA app 00 11 00410000 00400000
I (35) boot: End of partition table
I (35) boot: No factory image, trying OTA 0
I (35) esp_image: segment 0: paddr=00010020 vaddr=3f400020 size=184724h (1591076) map
I (473) esp_image: segment 1: paddr=0019474c vaddr=3ffb0000 size=03bd8h ( 15320) load
I (478) esp_image: segment 2: paddr=0019832c vaddr=40080000 size=07cech ( 31980) load
I (489) esp_image: segment 3: paddr=001a0020 vaddr=400d0020 size=1c804h (116740) map
I (521) esp_image: segment 4: paddr=001bc82c vaddr=40087cec size=08a48h ( 35400) load
I (541) boot: Loaded app from partition at offset 0x10000
I (560) boot: Set actual ota_seq=1 in otadata[0]
I (560) boot: Disabling RNG early entropy source...
I (560) cpu_start: Multicore app
I (561) quad_psram: This chip is ESP32-D0WD
I (561) esp_psram: Found 8MB PSRAM device
I (562) esp_psram: Speed: 80MHz
I (562) esp_psram: PSRAM initialized, cache is in low/high (2-core) mode.
W (562) esp_psram: Virtual address not enough for PSRAM, map as much as we can. 4MB is mapped
I (562) cpu_start: Pro cpu up.
I (562) cpu_start: Starting app cpu, entry point is 0x40081394
0x40081394: call_start_cpu1 at /COMPONENT_ESP_SYSTEM_DIR/port/cpu_start.c:154
I (0) cpu_start: App cpu up.
I (1074) esp_psram: SPI SRAM memory test OK
I (1082) cpu_start: Pro cpu start user code
I (1082) cpu_start: cpu freq: 240000000 Hz
I (1082) cpu_start: Application information:
I (1082) cpu_start: Project name: CT-Webinterface
I (1082) cpu_start: App version: 3.4.2
I (1083) cpu_start: ELF file SHA256: 79a82c3216d78d95...
I (1083) cpu_start: ESP-IDF: HEAD-HASH-NOTFOUND
I (1083) cpu_start: Min chip rev: v1.0
I (1084) cpu_start: Max chip rev: v3.99
I (1084) cpu_start: Chip rev: v1.0
I (1084) heap_init: Initializing. RAM available for dynamic allocation:
I (1085) heap_init: At 3FFAE6E0 len 00001920 (6 KiB): DRAM
I (1085) heap_init: At 3FFB44D0 len 0002BB30 (174 KiB): DRAM
I (1085) heap_init: At 3FFE0440 len 00003AE0 (14 KiB): D/IRAM
I (1086) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
I (1087) heap_init: At 40090734 len 0000F8CC (62 KiB): IRAM
I (1370) esp_psram: Adding pool of 4096K of PSRAM memory to heap allocator
I (1371) spi_flash: detected chip: gd
I (1371) spi_flash: flash io: qio
I (1372) app_start: Starting scheduler on CPU0
I (1373) app_start: Starting scheduler on CPU1
I (1373) main_task: Started on CPU0
I (1383) esp_psram: Reserving pool of 24K of internal memory for DMA/internal allocations
I (1383) main_task: Calling app_main()
I (1383) APP: Reset reason: 1
E (1383) Filesystem / Factory Defaults: Restoring factory default content on SD card
I (1383) Filesystem / Factory Defaults: Processing decompressed buffer 0x3f800888 (size 32768) in archive unpacker task:
I (1413) Filesystem / Factory Defaults: Processing decompressed buffer 0x3f80889c (size 32768) in archive unpacker task:
I (1443) Filesystem / Factory Defaults: Processing decompressed buffer 0x3f8108b0 (size 32768) in archive unpacker task:
I (1463) Filesystem / Factory Defaults: Processing decompressed buffer 0x3f8188c4 (size 32768) in archive unpacker task:
I (1493) Filesystem / Factory Defaults: Processing decompressed buffer 0x3f8208d8 (size 32768) in archive unpacker task:
I (1503) Filesystem / Factory Defaults: Processing decompressed buffer 0x3f8288ec (size 32768) in archive unpacker task:
I (1513) Filesystem / Factory Defaults: Processing decompressed buffer 0x3f830900 (size 32768) in archive unpacker task:
I (1533) Filesystem / Factory Defaults: Processing decompressed buffer 0x3f838914 (size 32768) in archive unpacker task:
I (1543) Filesystem / Factory Defaults: Processing decompressed buffer 0x3f80165c (size 32768) in archive unpacker task:
I (1553) Filesystem / Factory Defaults: Processing decompressed buffer 0x3f840928 (size 32768) in archive unpacker task:
I (1563) Filesystem / Factory Defaults: Processing decompressed buffer 0x3f809670 (size 32768) in archive unpacker task:
I (1573) Filesystem / Factory Defaults: Processing decompressed buffer 0x3f84893c (size 32768) in archive unpacker task:
E (1583) Filesystem / Factory Defaults: Canary bytes of buffer 0x3f822ad4 (size 3535) defective:
E (1583) Filesystem / Factory Defaults: 0x3ffc2418 00 56 ad ba |.V..|
assert failed: archive_entry_unpacking_content deploy_factory_default_image.c:229 (check_canary_bytes_of_buffer(context->js_file_buffer, context->js_file_buffer_size))
Backtrace: 0x40081bc1:0x3ffc22d0 0x40089669:0x3ffc22f0 0x4008f0fe:0x3ffc2310 0x400d9731:0x3ffc2440 0x400da228:0x3ffc2460 0x400da5bb:0x3ffc2490 0x400d8ff9:0x3ffc24c0 0x4008c499:0x3ffc2740
0x40081bc1: panic_abort at /COMPONENT_ESP_SYSTEM_DIR/panic.c:452
0x40089669: esp_system_abort at /COMPONENT_ESP_SYSTEM_DIR/port/esp_system_chip.c:90
0x4008f0fe: __assert_func at /COMPONENT_NEWLIB_DIR/assert.c:81
0x400d9731: archive_entry_unpacking_content at /COMPONENT_MAIN_DIR/filesystem/deploy_factory_default_image.c:229 (discriminator 1)
0x400da228: archive_unpacker_step_fsm at /COMPONENT_MAIN_DIR/filesystem/archive.c:172
0x400da5bb: archive_unpacker_feed at /COMPONENT_MAIN_DIR/filesystem/archive.c:41 (discriminator 2)
0x400d8ff9: archive_unpacker_task_code at /COMPONENT_MAIN_DIR/filesystem/deploy_factory_default_image.c:141 (discriminator 13)
0x4008c499: vPortTaskWrapper at /COMPONENT_FREERTOS_DIR/FreeRTOS-Kernel/portable/xtensa/port.c:162
If you need any additional info, please contact me, I will do my best to help finding this bug.
gcc version:
(crosstool-NG esp-2021r2) 8.4.0
IDF:v4.3.2
build with heap poisoning enabled. compiling with flags:-mlongcalls -mfix-esp32-psram-cache-issue -mfix-esp32-psram-cache-strategy=memw -ggdb -O2 -std=gnu99 -ffunction-sections -fdata-sections -fstrict-volatile-bitfields -fstack-protector
chip:ESP32-PICO-D4 (revision 1)
PlatformAllocateChunk
is a wrapper forheap_caps_malloc
withMALLOC_CAP_SPIRAM
argument.This source:
produces this code:
Notice that a5 and a6 both point to the same address, so
s32i a10, a5, 4
andl32i.n a7, a6, 4
are violating errata issue 3.9. The result is that a10 is not written into memory. a7 contains the original value of memory, which was poisoned to 0xcececece by malloc.So, my guess would be that the bug is probably cased by
fix-esp32-psram-cache-issue
workaround not noticing gcc used two registers to access the same address.