Kingcom / armips

An assembler for various ARM and MIPS platforms. Builds available at http://buildbot.orphis.net/armips/
MIT License
363 stars 77 forks source link

Add directives to automatically allocate free space #173

Closed unknownbrackets closed 4 years ago

unknownbrackets commented 4 years ago

This adds a few directives:

Basic usage is explained in the Readme. The basic concept here is, you may have pockets of free space (perhaps because you've deleted old data, removed functions, or shortened existing functions.) You may then need to define a function. Rather than searching for free space manually, and doing the math to see if it'll fit, this allows armips to take care of that for you.

.autoregion can take parameters, which may be necessary if it must be in a certain range (i.e. a Thumb bl range, etc.)

-[Unknown]

sp1187 commented 4 years ago

I would like the ability to invalidate or specify shared areas in some way in case you have multiple overlays in a single assembly output file and need to restrict the autoareas to a specific overlay.

unknownbrackets commented 4 years ago

It takes a parameter and is specific to the current output file.

So for example, if you have 3 output files, .autoareas for the second file will only be allocated within sharedareas in the second file.

In cases where you need it to be in a certain region of memory, you can use the parameters to force that. For example, maybe you need it to be allocated anywhere within branch range of some source instruction - you could specify that. You'd get an error if no free space was available where you required it.

I did consider trying to name the shared areas, but I think that opens up a lot of complexity. It invites "tagging" the shared areas (see my use case above - maybe your overlay has 8 gaps you've punched out with rewritten functions or deleted data), and that gets into some syntax that feels very foreign among the rest of armips' syntax.

Personally I think specifying a range is the simplest. I assume people would use labels as appropriate.

-[Unknown]

Prof9 commented 4 years ago

I would very much like to have some sort of tagging functionality. Personally, for a project I'm working on I have a ton of free space in the ARM9 binary and not a whole lot in the actual overlay files themselves. So with the current implementation, it would be difficult if not impossible to use my ARM9 free space from an overlay.

Another reason would be bls in a GBA game. Usually the .text section comes early in the ROM and all of the data blobs (graphics, text, etc.) are at the end. If you have free space near the .text section, you can use bl; otherwise, if you use free space at the end of the ROM, the jump distance is too great and you need to use bx instead. With tagging, you could tag sharedareas as .text and then when you use autoarea, you can specify that you want one of the sharedareas tagged with .text.

I don't think tags are all that foreign from the rest of armips' syntax seeing as we also have stuff like .definelabel, which takes an undefined symbol as a parameter. You can do the same kind of thing with macros. But if it makes things easier, you could consider using numbers as tags instead, then you could simply .definelabel all the tag numbers you want to use in order to have symbolic names.

unknownbrackets commented 4 years ago

In my case, I've defined 22 areas which doesn't even seem like that many. I think all, or at least 20, are in the text section.

I definitely would not want to tag each one. I'd rather say:

TextSegStart equ 0x000209D8
TextSegEnd equ 0x000F9600

Yes, I'd have to specify this for every autoarea, but I'd have to do that either way.

The complexity comes in on multiple tags. Is it .sharedarea 0x000F9600-.,0x00,tag1,tag2,tag3? What happens if I'm not filling? Is it .tagarea tag1 :: .tagarea tag2 :: .tagarea tag3 then? What's the real utility of this over specifying a valid range using equ defines like above? By adding a lot more syntax, are things easier, or is there just more I have to know to use it?

The bl example is really a great counterexample for tagging. Since it can be +/- 4MB from the caller, doing it based on tag is limiting - maybe I have some space that is 1 MB later in the ROM (let's say by moving other data), and I can bl it just fine. I could just say .autoarea 0,0x00400000 and be done.

A related example is b, which has a range of 2 KB. This is actually much more relevant for me, so let's give a real example.

I have a function that draws text. Originally, this function could rely on length of text = number of 8x8 tiles to clear (i.e. to erase old text.) With the VWF, this isn't true - it could easily be 2x as many characters as tiles. In my case, I have a few helpers that look like this:

; Forces clear to 8, which is common.
.func CopyString8x8Clear8
mov r0,8
; Fall through to CopyString8x8ClearR0.
.endfunc
; This allows quick specification of clear width.
.func CopyString8x8ClearR0
ldr r1,=MFontClearSize
; Shorts to clear.
lsl r0,r0,4
strh r0,[r1]
b CopyString8x8ToVRAM
.endfunc

; (elsewhere, other area)

; Clear width in r0, pixel width in r2.
.func CopyString8x8CenterR0
ldr r1,=MFontClearSize
; Multiply by 8 to get pixel clear width.
lsl r3,r0,3
sub r3,r3,r2
lsr r3,r3,1
; Okay, now put that in the x override.
strb r3,[r1,MFontXOffset-MFontClearSize]

ldr r3,=CopyString8x8ClearR0+1
bx r3
.pool
.endfunc

In this case, I ended up using a bx just as you described. But I probably DO have space within 2KB of CopyString8x8ClearR0 and CopyString8x8ToVRAM, I just didn't want to manually find it.

Rather than using an army of tags to solve that (and each b I want), I'd again much rather specify a range: .autoarea CopyString8x8ToVRAM-2048,CopyString8x8ToVRAM+1536. If my assumption that I have sufficient nearby space is wrong, it'll tell me.

As for managing overlays, it's more a question of code organization. If it's possible to define helpers/functions that can go in the asm files that populate the ARM9 code, you could still have those auto allocate within ARM9. I'm not that familiar with your use case, so I'm not sure if there's a good reason to group the code into the same source files as writes to the overlays. But I do think it would become confusing if a source file wrote to multiple output files in a non-obvious way.

-[Unknown]

Prof9 commented 4 years ago

Ah, I didn't realize you can constrain .autoarea to a specific memory range. I guess that's probably fine for my use case. That means the starting/ending address of the autoarea is, effectively, a single tag. And I could .close the overlay, .open the ARM9 binary, write my autoarea, .close the ARM9 binary and (if necessary) re-.open my overlay again. Slightly cumbersome, but not too bad.

I do think tags could still have a benefit here, because it would allow you to create free space in both the overlay and ARM9 file, and use the overlay free space first (since it can only be accessed from that specific overlay) before falling back on the ARM9 free space (which is accessible from any overlay). Otherwise, you would still be managing your .autoareas manually to some degree.

Perhaps this could be realized by allowing the user to specify multiple ranges for .autoarea, and choose the first one that fits? Although this doesn't really solve the problem of the areas being in two separate physical files...

unknownbrackets commented 4 years ago

Well, ideally, I'd like my b example above to be automatic, where it would just place it somewhere the content would validate. But I'm also trying to avoid making perfect the enemy of improvement. One change at a time.

I'm not sure if the same is realistic for overlays, since I assume armips has missing information about whether output file A can be known to be loaded while output file B is (my understanding is that these are, essentially, statically linked objects, but dynamically reloadable like .sos/DLLs.) Even if it's obvious to you what is the always-loaded code, I assume armips (currently) has no idea.

Anyway, I'm not sure if anyone has a proposal for a better name than autoarea?

-[Unknown]

Blade2187 commented 4 years ago

Perhaps slightly off-topic, but this particular use case seems risky to me... b should be used when you know for sure your hops are short-range. If you are using b to perform hops that might fail because you are relying on automatically-placed functions that reside elsewhere to be in a particular narrow range, you should probably just use bx. The ARM documentation does discuss using a veneer to make an out-of-reach bl target accessible using bx, and avoiding having to do this is the sort of use case that I assumed was the intent, but I don't think that extends to b. Being able to automatically handle this would probably push armips towards compiler territory (?), which it is not.

Re: overflowing overlay additions into arm9, I'm not sure that's supported by armips, at least not automatically. I believe there is always only one output file at a time. Chances are that for now, you'd want to manually specify that certain code should go into the arm9, and then branch to it from the overlay. The opposite direction is of course right out, as multiple overlays share the same address space, and armips doesn't support knowing context of which one you mean short of actually opening it.

Re: tagging, I think it would probably solve several of these concerns that have to be manually specified for now, but can probably be pushed off to the future. For now probably best to just design the syntax to avoid impeding later implementation of tagging. Not sure on what that syntax should be, but what OP mentioned seemed reasonable.

Re: the name, I have no strong opinion on specific name, but I want to reiterate that .autoarea does an entirely different type of thing than .area, .sharedarea, and .definesharedarea, and this is what motivated me to suggest a name change so as to not be similar to the others. Something like .autoplace, .autorelocate, .autoblock, or even some special syntax involving .org would make more intuitive sense.

Prof9 commented 4 years ago

.autoarea essentially just changes the memory address, right? So I'd expect the name to contain org, e.g. .autoorg.

I also think it would make sense if .autoarea, .sharedarea and .definesharedarea have a common element in their name. I'm not sure .sharedorg is a good name for .autoarea, though, since the memory address isn't what's being shared - it's the area. What about something like .freeorg, .freearea and .definefreearea? Or .allocorg, .allocarea, .defineallocarea?

Kingcom commented 4 years ago

Currently it seems to use a First-Fit allocation algorithm. Is this good/appropriate enough or should we use a different one like Best-Fit?

unknownbrackets commented 4 years ago

I think the allocation algorithm could be changed in a separate follow up change. Best fit can be anything from simple to complex - could require more passes. Simplest version of that would just be, pick the largest free area that matches requirements.

A more complex version (maybe still no extra passes, not sure) would be to analyze the available subareas/allocs/regions/whatever at the end of the first pass (in this pass, they are only sized.) Main gain would be reducing wasted space if there's leftover.

I feel like reusing org is confusing because it has an end.

Perhaps we can just use a new word like region? .sharedregion and .autoregion? Then someone just has to know that sharedregions are, in some ways, similar to areas (in that they fill, and can have code in them which is not dynamically allocated - it's kept at the start.) Perhaps even just .region and .defineregion could work.

The ARM documentation does discuss using a veneer to make an out-of-reach

Well, if I'm hacking a GBA game to display a longer character name, or a PSP game to load a larger title image - I only care about following the ARM/MIPS published ABI guidelines to the extent that it's visible to other functions in the code (i.e. stack alignment.) If I have limited bytes, I'm more than happy to use tail calls and b in this way. Of course, if the code uses exceptions and might use a stack unwinder, I have to be slightly careful with that.

Luckily, if the b is not reachable, armips will fail to assemble, telling me I've made a mistake. Exactly what I want. I'm sure if it was a compiler (which it is indeed not), it might not allow my innocent violations of the ARM ABI.

-[Unknown]

Kingcom commented 4 years ago

I think the allocation algorithm could be changed in a separate follow up change. Best fit can be anything from simple to complex - could require more passes. Simplest version of that would just be, pick the largest free area that matches requirements.

I think that would actually be Worst-Fit. Each algorithm has benefits and drawbacks, though that approach can lead to being unable to allocate bigger chunks because all available areas are just barely too small. Using the area with the least available space is less susceptible to that. For now though, using the first available area should be sufficient. Until the next release we should be rather free in changing its implementation without worrying too much about compatibility.

I feel like reusing org is confusing because it has an end.

I agree with this.

Perhaps we can just use a new word like region? .sharedregion and .autoregion? Then someone just has to know that sharedregions are, in some ways, similar to areas (in that they fill, and can have code in them which is not dynamically allocated - it's kept at the start.) Perhaps even just .region and .defineregion could work.

I like this idea. It's different enough from area to prevent confusion, but it still makes sense that it conceptually does something similar.

unknownbrackets commented 4 years ago

Well, it'd be worst fit depending on the scenario really, assuming they are allocated unordered. As mentioned, it'd need to know the sizes of everything (to do an ordered allocation) to really do it right.

Renamed to regions (rebased to make it easier to see the code, also needed rebasing from recent changes anyway.)

Also added some tests.

-[Unknown]