Closed thejpster closed 12 months ago
I could also reduce Core 1's stack to, say, 2048 bytes (or 1024 bytes, or even 512 bytes), but that would make Core 0 and Core 1 contend for access to SRAM_REGION_4. There are registers we can read to check how many cycles were stalled waiting for contention, but I figured it was easier for now to just steal some of the striped RAM.
Idea. Move the Core 1 stuff like the text buffer into SRAM4 and leave the 256K for Core 0.
Re-arranged things somewhat.
Now there is just a RAM
memory region (plus the RAM_OS
region which is memory for the OS).
The RAM
memory region sits at the top of the SRAM address space, using some of the striped memory, plus all of SRAM_BLOCK4 and SRAM_BLOCK5. Within this region, the .data
and .bss
sit at the bottom, and the Core 0 stack sits at the top. The Core 1 stack is now just a static mut
array of 1024 bytes, located within the .bss
section.
I benchmarked the performance and it went from 497,040 chars/sec to 497,205 chars/sec - so basically no change, but certainly not worse than it was.
I observed in
probe-run
output that we were using all 4096 bytes of our allocated stack region. Adding the painting and measurement code shows that:Thus I upgrade Core 0 to have an 8 KiB stack - stealing the top 4 KiB of the striped region and moving all the BIOS global variables down, and also reducing the TPA by 4 KiB.
Technically we weren't damaging any globals because the 24 KiB block allocated for them had about 900 bytes unused. But we were asking for trouble.
This also led to me finding a bug in probe-run, where it doesn't understand that a stack can span across multiple contiguous memory regions. I raised that as https://github.com/knurling-rs/probe-run/issues/415