Open asiekierka opened 5 months ago
The linker issue could be solved by moving the stack top pointer outside the linker script. I can understand why you want to manage everything with the linker script but i think this is a reasonable workaround.
Another idea is to split DTCM into 2 memory regions in the linker script and assign the stack to the first region and bss to the second.
The linker issue could be solved by moving the stack top pointer outside the linker script.
It wouldn't. You still can't place the actual user variables at the end of DTCM, because you don't know how big the user variable section is until after it has been allocated - unless you use a two-pass strategy.
Another idea is to split DTCM into 2 memory regions in the linker script and assign the stack to the first region and bss to the second.
That is along the lines of my plan on how to implement it: the link script gains a symbol named along the lines of dtcm_data_region_size
, which defines the size of the data region.
Ah, i wrote that without giving it a second thought. If the data could somehow be allocated at runtime on the stack and this DTCM bss region is removed it would solve the issue.
This would require at least one layer of indirection (or something like runtime relocation - scary!), and thus nullify a chunk of the benefit of using DTCM in the first place.
The first part of this - adding a configurable __dtcm_data_size
link-time argument - is done in https://github.com/blocksds/sdk/pull/202
There's another problem here.
While everything is fine in TWL mode, in NTR mode the mirror at 2C00000..2FFFFFF
that the extended stack would end up using is uncached, which isn't all that great.
First, to explain "the DTCM stack problem", I'm going to need to use a simplified model of the ARM9 memory space.
Generally, on ARM9, the "main memory" space stretches from
0x2000000
to0x2FFFFFF
- sixteen megabytes on DSi, four megabytes mirrored four times on NDS. Between 48 and 64 KB from the end of this space - that is,0x2FF0000
to0x2FF3FFF
, DTCM is placed - a 16KB area of fast memory visible to the CPU. This space houses the following things:DTCM_BSS
etc.), if any,Let's look at the memory situation when user variables are not present:
The idea of overlaying DTCM on top of main memory is sound - while it does mean accessing that 16KB of memory becomes trickier, it allows the stack to grow past DTCM into slower (especially on NDS where it is uncached) main memory.
However, what happens if we introduce user variables to the mix?
Oh no! The stack is now bounded, and quite small at that - if we use 8KB of DTCM for user data, for example, that makes our stack limited to slightly over 7.5KB.
There's a few ways to solve, or work around, the DTCM stack problem:
cothread
mechanism places a variable there. This doesn't solve the problem at all, but it at least means programs which do not otherwise use DTCM will have a less bounded stack. This could be a good workaround for release 1.3.0, though.The catch is that the GNU linker does not support placing a section at the end of a memory region, only at the beginning - and it provides no (reliable) way to figure out the section's size before allocating it. This solution would thus require a more complex approach to linking than just calling
ld
: the size of the DTCM section would need to be calculated first (perhaps with a "stub" linker script which only allocates DTCM variables), and only then could one do the final link. This, then, necessitates writing a linker wrapper that BlocksDS programs use to link.As a side-note, while DTCM is being discussed - why not move it to
0x2FF4000
? This space houses the devkitARM-standard bootstub, and is thus reserved and only used when the homebrew application is exiting - the DTCM could be relocated in such a situation to access the bootstub, while programs writing their own bootstubs could simply use an uncached pointer not covered up by DTCM. This would, then, unlock that additional 16 kilobytes of heap for user programs.