Closed unknownbrackets closed 4 years ago
I would like the ability to invalidate or specify shared areas in some way in case you have multiple overlays in a single assembly output file and need to restrict the autoareas to a specific overlay.
It takes a parameter and is specific to the current output file.
So for example, if you have 3 output files, .autoarea
s for the second file will only be allocated within sharedarea
s in the second file.
In cases where you need it to be in a certain region of memory, you can use the parameters to force that. For example, maybe you need it to be allocated anywhere within branch range of some source instruction - you could specify that. You'd get an error if no free space was available where you required it.
I did consider trying to name the shared areas, but I think that opens up a lot of complexity. It invites "tagging" the shared areas (see my use case above - maybe your overlay has 8 gaps you've punched out with rewritten functions or deleted data), and that gets into some syntax that feels very foreign among the rest of armips' syntax.
Personally I think specifying a range is the simplest. I assume people would use labels as appropriate.
-[Unknown]
I would very much like to have some sort of tagging functionality. Personally, for a project I'm working on I have a ton of free space in the ARM9 binary and not a whole lot in the actual overlay files themselves. So with the current implementation, it would be difficult if not impossible to use my ARM9 free space from an overlay.
Another reason would be bl
s in a GBA game. Usually the .text section comes early in the ROM and all of the data blobs (graphics, text, etc.) are at the end. If you have free space near the .text section, you can use bl
; otherwise, if you use free space at the end of the ROM, the jump distance is too great and you need to use bx
instead. With tagging, you could tag sharedarea
s as .text and then when you use autoarea
, you can specify that you want one of the sharedarea
s tagged with .text.
I don't think tags are all that foreign from the rest of armips' syntax seeing as we also have stuff like .definelabel
, which takes an undefined symbol as a parameter. You can do the same kind of thing with macros. But if it makes things easier, you could consider using numbers as tags instead, then you could simply .definelabel
all the tag numbers you want to use in order to have symbolic names.
In my case, I've defined 22 areas which doesn't even seem like that many. I think all, or at least 20, are in the text section.
I definitely would not want to tag each one. I'd rather say:
TextSegStart equ 0x000209D8
TextSegEnd equ 0x000F9600
Yes, I'd have to specify this for every autoarea, but I'd have to do that either way.
The complexity comes in on multiple tags. Is it .sharedarea 0x000F9600-.,0x00,tag1,tag2,tag3
? What happens if I'm not filling? Is it .tagarea tag1 :: .tagarea tag2 :: .tagarea tag3
then? What's the real utility of this over specifying a valid range using equ defines like above? By adding a lot more syntax, are things easier, or is there just more I have to know to use it?
The bl
example is really a great counterexample for tagging. Since it can be +/- 4MB from the caller, doing it based on tag is limiting - maybe I have some space that is 1 MB later in the ROM (let's say by moving other data), and I can bl
it just fine. I could just say .autoarea 0,0x00400000
and be done.
A related example is b
, which has a range of 2 KB. This is actually much more relevant for me, so let's give a real example.
I have a function that draws text. Originally, this function could rely on length of text = number of 8x8 tiles to clear (i.e. to erase old text.) With the VWF, this isn't true - it could easily be 2x as many characters as tiles. In my case, I have a few helpers that look like this:
; Forces clear to 8, which is common.
.func CopyString8x8Clear8
mov r0,8
; Fall through to CopyString8x8ClearR0.
.endfunc
; This allows quick specification of clear width.
.func CopyString8x8ClearR0
ldr r1,=MFontClearSize
; Shorts to clear.
lsl r0,r0,4
strh r0,[r1]
b CopyString8x8ToVRAM
.endfunc
; (elsewhere, other area)
; Clear width in r0, pixel width in r2.
.func CopyString8x8CenterR0
ldr r1,=MFontClearSize
; Multiply by 8 to get pixel clear width.
lsl r3,r0,3
sub r3,r3,r2
lsr r3,r3,1
; Okay, now put that in the x override.
strb r3,[r1,MFontXOffset-MFontClearSize]
ldr r3,=CopyString8x8ClearR0+1
bx r3
.pool
.endfunc
In this case, I ended up using a bx just as you described. But I probably DO have space within 2KB of CopyString8x8ClearR0
and CopyString8x8ToVRAM
, I just didn't want to manually find it.
Rather than using an army of tags to solve that (and each b
I want), I'd again much rather specify a range: .autoarea CopyString8x8ToVRAM-2048,CopyString8x8ToVRAM+1536
. If my assumption that I have sufficient nearby space is wrong, it'll tell me.
As for managing overlays, it's more a question of code organization. If it's possible to define helpers/functions that can go in the asm files that populate the ARM9 code, you could still have those auto allocate within ARM9. I'm not that familiar with your use case, so I'm not sure if there's a good reason to group the code into the same source files as writes to the overlays. But I do think it would become confusing if a source file wrote to multiple output files in a non-obvious way.
-[Unknown]
Ah, I didn't realize you can constrain .autoarea
to a specific memory range. I guess that's probably fine for my use case. That means the starting/ending address of the autoarea is, effectively, a single tag. And I could .close
the overlay, .open
the ARM9 binary, write my autoarea, .close
the ARM9 binary and (if necessary) re-.open
my overlay again. Slightly cumbersome, but not too bad.
I do think tags could still have a benefit here, because it would allow you to create free space in both the overlay and ARM9 file, and use the overlay free space first (since it can only be accessed from that specific overlay) before falling back on the ARM9 free space (which is accessible from any overlay). Otherwise, you would still be managing your .autoarea
s manually to some degree.
Perhaps this could be realized by allowing the user to specify multiple ranges for .autoarea
, and choose the first one that fits? Although this doesn't really solve the problem of the areas being in two separate physical files...
Well, ideally, I'd like my b
example above to be automatic, where it would just place it somewhere the content would validate. But I'm also trying to avoid making perfect the enemy of improvement. One change at a time.
I'm not sure if the same is realistic for overlays, since I assume armips has missing information about whether output file A can be known to be loaded while output file B is (my understanding is that these are, essentially, statically linked objects, but dynamically reloadable like .so
s/DLLs.) Even if it's obvious to you what is the always-loaded code, I assume armips (currently) has no idea.
Anyway, I'm not sure if anyone has a proposal for a better name than autoarea
?
-[Unknown]
Perhaps slightly off-topic, but this particular use case seems risky to me... b
should be used when you know for sure your hops are short-range. If you are using b
to perform hops that might fail because you are relying on automatically-placed functions that reside elsewhere to be in a particular narrow range, you should probably just use bx
. The ARM documentation does discuss using a veneer to make an out-of-reach bl
target accessible using bx
, and avoiding having to do this is the sort of use case that I assumed was the intent, but I don't think that extends to b
. Being able to automatically handle this would probably push armips towards compiler territory (?), which it is not.
Re: overflowing overlay additions into arm9, I'm not sure that's supported by armips, at least not automatically. I believe there is always only one output file at a time. Chances are that for now, you'd want to manually specify that certain code should go into the arm9, and then branch to it from the overlay. The opposite direction is of course right out, as multiple overlays share the same address space, and armips doesn't support knowing context of which one you mean short of actually opening it.
Re: tagging, I think it would probably solve several of these concerns that have to be manually specified for now, but can probably be pushed off to the future. For now probably best to just design the syntax to avoid impeding later implementation of tagging. Not sure on what that syntax should be, but what OP mentioned seemed reasonable.
Re: the name, I have no strong opinion on specific name, but I want to reiterate that .autoarea
does an entirely different type of thing than .area
, .sharedarea
, and .definesharedarea
, and this is what motivated me to suggest a name change so as to not be similar to the others. Something like .autoplace
, .autorelocate
, .autoblock
, or even some special syntax involving .org
would make more intuitive sense.
.autoarea
essentially just changes the memory address, right? So I'd expect the name to contain org
, e.g. .autoorg
.
I also think it would make sense if .autoarea
, .sharedarea
and .definesharedarea
have a common element in their name. I'm not sure .sharedorg
is a good name for .autoarea
, though, since the memory address isn't what's being shared - it's the area. What about something like .freeorg
, .freearea
and .definefreearea
? Or .allocorg
, .allocarea
, .defineallocarea
?
Currently it seems to use a First-Fit allocation algorithm. Is this good/appropriate enough or should we use a different one like Best-Fit?
I think the allocation algorithm could be changed in a separate follow up change. Best fit can be anything from simple to complex - could require more passes. Simplest version of that would just be, pick the largest free area that matches requirements.
A more complex version (maybe still no extra passes, not sure) would be to analyze the available subareas/allocs/regions/whatever at the end of the first pass (in this pass, they are only sized.) Main gain would be reducing wasted space if there's leftover.
I feel like reusing org
is confusing because it has an end.
Perhaps we can just use a new word like region
? .sharedregion
and .autoregion
? Then someone just has to know that sharedregions
are, in some ways, similar to areas (in that they fill, and can have code in them which is not dynamically allocated - it's kept at the start.) Perhaps even just .region
and .defineregion
could work.
The ARM documentation does discuss using a veneer to make an out-of-reach
Well, if I'm hacking a GBA game to display a longer character name, or a PSP game to load a larger title image - I only care about following the ARM/MIPS published ABI guidelines to the extent that it's visible to other functions in the code (i.e. stack alignment.) If I have limited bytes, I'm more than happy to use tail calls and b
in this way. Of course, if the code uses exceptions and might use a stack unwinder, I have to be slightly careful with that.
Luckily, if the b
is not reachable, armips
will fail to assemble, telling me I've made a mistake. Exactly what I want. I'm sure if it was a compiler (which it is indeed not), it might not allow my innocent violations of the ARM ABI.
-[Unknown]
I think the allocation algorithm could be changed in a separate follow up change. Best fit can be anything from simple to complex - could require more passes. Simplest version of that would just be, pick the largest free area that matches requirements.
I think that would actually be Worst-Fit. Each algorithm has benefits and drawbacks, though that approach can lead to being unable to allocate bigger chunks because all available areas are just barely too small. Using the area with the least available space is less susceptible to that. For now though, using the first available area should be sufficient. Until the next release we should be rather free in changing its implementation without worrying too much about compatibility.
I feel like reusing
org
is confusing because it has an end.
I agree with this.
Perhaps we can just use a new word like
region
?.sharedregion
and.autoregion
? Then someone just has to know thatsharedregions
are, in some ways, similar to areas (in that they fill, and can have code in them which is not dynamically allocated - it's kept at the start.) Perhaps even just.region
and.defineregion
could work.
I like this idea. It's different enough from area to prevent confusion, but it still makes sense that it conceptually does something similar.
Well, it'd be worst fit depending on the scenario really, assuming they are allocated unordered. As mentioned, it'd need to know the sizes of everything (to do an ordered allocation) to really do it right.
Renamed to regions (rebased to make it easier to see the code, also needed rebasing from recent changes anyway.)
Also added some tests.
-[Unknown]
This adds a few directives:
.region
, like.area
but set as shared. Also always skips or fills to the end..definesregion A,B,C
as a shorthand for.org A :: .sharedarea B,C :: .endsharedarea
..autoregion
which automatically allocates within regions.Basic usage is explained in the Readme. The basic concept here is, you may have pockets of free space (perhaps because you've deleted old data, removed functions, or shortened existing functions.) You may then need to define a function. Rather than searching for free space manually, and doing the math to see if it'll fit, this allows armips to take care of that for you.
.autoregion
can take parameters, which may be necessary if it must be in a certain range (i.e. a Thumb bl range, etc.)-[Unknown]