Increase core0 stack to 8K and paint/measure stack usage.

Neotron-Compute / Neotron-Pico-BIOS

BIOS for the Neotron Pico

GNU General Public License v3.0

11 stars 5 forks source link

Increase core0 stack to 8K and paint/measure stack usage. #67

Closed thejpster closed 12 months ago

thejpster commented 1 year ago

I observed in probe-run output that we were using all 4096 bytes of our allocated stack region. Adding the painting and measurement code shows that:

Core 0 uses 4372 bytes peak
Core 1 uses 312 bytes peak

Thus I upgrade Core 0 to have an 8 KiB stack - stealing the top 4 KiB of the striped region and moving all the BIOS global variables down, and also reducing the TPA by 4 KiB.

Technically we weren't damaging any globals because the 24 KiB block allocated for them had about 900 bytes unused. But we were asking for trouble.

This also led to me finding a bug in probe-run, where it doesn't understand that a stack can span across multiple contiguous memory regions. I raised that as https://github.com/knurling-rs/probe-run/issues/415

thejpster commented 1 year ago

I could also reduce Core 1's stack to, say, 2048 bytes (or 1024 bytes, or even 512 bytes), but that would make Core 0 and Core 1 contend for access to SRAM_REGION_4. There are registers we can read to check how many cycles were stalled waiting for contention, but I figured it was easier for now to just steal some of the striped RAM.

thejpster commented 12 months ago

Idea. Move the Core 1 stuff like the text buffer into SRAM4 and leave the 256K for Core 0.

thejpster commented 12 months ago

Re-arranged things somewhat.

Now there is just a RAM memory region (plus the RAM_OS region which is memory for the OS).

The RAM memory region sits at the top of the SRAM address space, using some of the striped memory, plus all of SRAM_BLOCK4 and SRAM_BLOCK5. Within this region, the .data and .bss sit at the bottom, and the Core 0 stack sits at the top. The Core 1 stack is now just a static mut array of 1024 bytes, located within the .bss section.

I benchmarked the performance and it went from 497,040 chars/sec to 497,205 chars/sec - so basically no change, but certainly not worse than it was.