ARM-software / abi-aa

Application Binary Interface for the Arm® Architecture
Other
877 stars 173 forks source link

[aapcs64] Unclear callee/caller wording in aapcs64.rst #266

Open a74nh opened 2 weeks ago

a74nh commented 2 weeks ago

In aapcs64.rst

z0-z7 are used to pass scalable vector arguments to a subroutine, and to return scalable vector results from a function. If a subroutine takes at least one argument in scalable vector registers or scalable predicate registers, or if it is a function that returns results in such registers, it must ensure that the entire contents of z8-z23 are preserved across the call. In other cases it need only preserve the low 64 bits of z8-z15, as described in SIMD and Floating-Point registers.

p0-p3 are used to pass scalable predicate arguments to a subroutine and to return scalable predicate results from a function. If a subroutine takes at least one argument in scalable vector registers or scalable predicate registers, or if it is a function that returns results in such registers, it must ensure that p4-p15 are preserved across the call. In other cases it need not preserve any scalable predicate register contents.

In both cases in it must ensure that it is not clear whether it refers to the caller or the callee.

Eg: if it is the callee then the wording should be the subroutine must ensure that.

This wording caused issues when designing SVE support for .NET.

kunalspathak commented 2 weeks ago

.NET issue that describes SVE support: https://github.com/dotnet/runtime/issues/93095

smithp35 commented 2 weeks ago

I agree that the sentence can be clarified.

Assuming the confusion hasn't been resolved already, there's a couple of other parts that may help parsing the text: https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#22terms-and-abbreviations

Routine, subroutine
A fragment of program to which control can be transferred that, on completing its task, returns control to its caller at an instruction following the call. Routine is used for clarity where there are nested calls: a routine is the caller and a subroutine is the callee.

https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#3scope

Obligations on the called routine to preserve the program state of the caller across the call.

Combining this with the original text, the it is referring to the callee.

a74nh commented 2 weeks ago

Combining this with the original text, the it is referring to the callee.

Thanks! That was the conclusion we came to after reading around the issue elsewhere. But it would be nice for it to be clearer.

smithp35 commented 2 weeks ago

https://github.com/ARM-software/abi-aa/pull/267 to update wording.

kunalspathak commented 2 weeks ago

Just to be clear, here is my understanding. @rsandifo-arm @smithp35 - please correct if I missed anything.

Terminology

A()
{
prolog:
   save callee-save registers
   ...
   ...
   save caller-save registers
   B();
   restore caller-save registers
   ...   
   ...
epilog:
   restore callee-save registers
}

Float/Scalable registers

Scenario# A to B prolog/epilog of A before/after call to B
1 regular to regular bottom 64-bits v8-v15 1 v0-v7, v16-v31, top 64-bits v8-v15 1
2 regular to sve bottom 64-bits v8-v15 1 z0-z7, z24-z31
3 sve to regular z8-z23 v0-v7, v16-v31, top 64-bits v8-v15 1
4 sve to sve z8-z23 z0-z7, z24-z31

1 : This is same specification we have for NEON and only applicable when registers are in use or live

Predicate registers

Scenario# A to B prolog/epilog of A before/after call to B
1 regular to regular NA p0-p15
2 regular to sve NA p0-p3
3 sve to regular p4-p15 p0-p15
4 sve to sve p4-p15 p0-p3
smithp35 commented 2 weeks ago

I'm going to use the official terminology of caller-save instead of callee_trash.

Just to be sure, apologies if this was already clear, caller-save and callee-save are more like responsibilities to save than they are requirements to save. For example a callee only needs to save a callee-save register if it uses the register. A caller only needs to save a caller-save register before a call if there is a live value in the register that the caller needs to access after the call.

This is my reading of the document. I'm not a SVE expert like @rsandifo-arm so if I've got this wrong please go with his answer/correction rather than mine. I'm more of a linker than a compiler person.

I found it easier to describe when not considering the different call scenarios as there is only a caller and a callee and the responsibilities of the caller don't change if the callee is sve or regular.

Function type callee-save caller-save
regular bottom 64-bits of v8-v15 v0-v7, v16-v31, top 64-bits of v8-15
sve z8-z23 v0-v7, v16-v31 (*)

(*) z16-z23 are extensions of v16-v23 so these are both callee and caller saved.

Function type callee-save caller-save
regular - p0-p3
sve p4-15 p0-p3

I do hope I've got this right, if I haven't and it isn't a silly mistake then we may need more clarifications.

kunalspathak commented 2 weeks ago

I found it easier to describe when not considering the different call scenarios

That's how I wanted it to be, but I wanted to be explicit about the situation. For e.g. in your table, for "regular" function type, under "caller-save", the way I interpret is if a "regular" function is a caller, what registers it need to save/restore across a function call. But that depends on what type of function it is calling. If it is a "regular" function, it needs to save/restore v0-v7, v16-v31, top 64-bits of v8-15, but if it is a sve function, it needs to just save/restore z0-z7, z24-z31, because the sve function (which will be callee in this case) will be responsible for preserving z8~z23. Same goes with other combination.

Also, for "regular" function type, if it is calling "regular" function, then it should save/restore entire p0~p15, while if it is calling "sve" function, it should preserve just p0~p3, because p4~p15 will be preserved by the "sve" function (which is callee in this case).

Note: When I say caller should preserve across function call, I mean only the registers that are live across the call. So, in my table, out of the registers mentioned in "callee-trash" column, only the registers that are live across the call will be preserved by the caller.

I do hope I've got this right

I feel the same :)

kunalspathak commented 2 weeks ago

then we may need more clarifications

Regardless of if we get this or not, I think the document needs a clear way of stating these requirements, something equivalent of how we are having this information in the table. Lot of time is being spent by multiple people in trying to interpret couple of lines of the document.

smithp35 commented 2 weeks ago

OK I see where you are coming from. The safest assumption is that what is not callee-saved by the callee must be caller-saved. That would indeed imply that p0~p15 would need saving when calling a regular function.

I'll reopen this as I think more work is needed here.

pmsjt commented 2 weeks ago

Functions without SVE types in the signature don't have to save any SVE state. If they had to, then existing function would not be legal anymore. The only things function without SVE types in the signature must worry about are:

tannergooding commented 2 weeks ago

the responsibilities of the caller don't change if the callee is sve or regular.

There is a lot of nuance here and it is easy for developers to miss considerations.

A callee x is responsible for saving (typically in the prologue) and restoring (typically in the epilogue) the callee-save set of its own calling convention a

A caller x is also responsible for saving (typically before the call) and restoring (typically after the call) the caller-save set of the calling convention b for callee y

Thus, if conventions a and b match (sve x->sve y -or- regular x->regular y), then this is relatively simple as you only have to consider the context of the individual methods x and y because the callee-save for a is the inverse mask to the caller-save for a

However, if conventions a and b do not match (sve x->regular y -or- regular x->sve y), then the caller-save set becomes more interesting as the callee-save for a will typically not be an inverse of the caller-save for b. Instead, they will have a union of some registers. This means that the caller x must also consider any registers that are disjoint.

The simplest example of this is that for a regular call, none of P0-P15 are considered callee-save. Thus a regular method is free to trash any and all predicate registers without consideration. However, P4-P15 are considered callee-save for an sve call and thus must save P4-P15 is they are used.

What this means is that for regular x->regular y, x is free to trash any predicate registers. If it has a predicate register that needs to remain "live" across the call to y, it must save/restore them.

For sve x->sve y, x is free to trash P0-P3, but must save and restore P4-P15 if they are used. It must only save P0-P3 across the call to y if they need to remain live.

However, for regular x->sve y the sets differ and x now only has to save P0-P3 because y must be saving/restoring P4-P15.

It gets very interesting for sve x->regular y however, because the regular call (y) is free to trash any of P0-P15. This means that not only does x need to save the normal set of P0-P3 if it's using them and needs them to remain live across the call, it must also assume that y will trash P4-P15 and is now responsible for saving them across the call boundary (because any prior sve caller could itself be using them and expected x to have saved them).

smithp35 commented 2 weeks ago

Thanks for the additional points. This has somewhat spiralled from the meaning of it :-) in a couple of sentences. I'll discuss with my colleagues to see if there is a better way of describing this.

kunalspathak commented 2 weeks ago

I have updated https://github.com/ARM-software/abi-aa/issues/266#issuecomment-2177309054 to use the terminology of "caller-save" instead of "callee-trash".

smithp35 commented 2 weeks ago

Looking at the table that you have updated I think it is best not to try and enumerate the caller-save registers and caller-save registers in the same table.

The callee-save registers are a requirement for a function to preserve the values of registers across the call, so that the values of these registers on entry to the function are the same as the values on return. This requirement is invariant of the caller, or whether there are any calls at all. This looks right in your table.

The set of caller-save registers are determined per call (a function could call both regular and sve functions). They are the registers that are not guaranteed to be preserved by the function being called (registers not in the callee-saves of the function being called).

Function Type Callee-saves
regular bottom 64-bits v8-v15
SVE z8-z23, p4-p15
Called function type Caller Save registers for call
regular All registers not in {bottom 64-bits of v8-v15} *
sve All registers not in {z8-z23, p4-p15}

I've got more registers that need to be saved when calling regular functions than your table entries for caller-save.

Hope I haven't made any mistakes, I'm hoping that we can find the right wording to improve the AAPCS over the next few weeks.

kunalspathak commented 2 weeks ago

All registers not in {bottom 64-bits of v8-v15} *

I assume that includes p0-p15 (might be better to clarify)

smithp35 commented 2 weeks ago

I've edited my * comment to "In practice this means all SVE state including predicate registers". Hopefully that should cover it.

kunalspathak commented 1 week ago

I've got more registers that need to be saved when calling regular functions than your table entries for caller-save.

Yes, I realized it and have updated https://github.com/ARM-software/abi-aa/issues/266#issuecomment-2177309054 accordingly.