[SME] Add ZA-compatible interface and routines to save/restore SME state

This implements requests to add a new "ZA-compatible" interface which can be called with ZA state being either 'off', 'active' or 'dormant', and which can preserve any and all state enabled under PSTATE.ZA.

It isn't clear to me whether we expect callers of __arm_sme_state_size to check for zero returns, and skip the save and restore calls for that case. If so (“case A”), I think this makes things unnecessarily complex for the callers. If not (“case B”), the buffer should have nonzero size even if PSTATE.ZA==0, so that it can hold at least the SAVED_ZA and SAVED_ZT0 fields.

And for case B, the save and restore functions should be callable even in threads that don't have access to SME.

The initial thinking was "case A", i.e. the intrinsics are only called when there is something worth saving. But now that I've added SAVED_ZA and SAVED_ZT0, it probably makes sense to remove that assumption.

I wonder how useful the OPTIONS will be in practice, given that __arm_sme_save must leave PSTATE.ZA in a dormant or off state. It seems unlikely that a caller would know enough about its position in the call hierarchy/overall program to know that certain state doesn't need to be preserved, but at the same time not know enough for it to have a specific __arm_in/out/inout/preserves annotation. No objection to keeping OPTIONS if you prefer though.

My reasons for adding OPTIONS was to give an interface that allows the routines to be used more generically, e.g. to support implementing the following case more easily using these intrinsics:

void bar() __arm_inout("zt0");
void foo() __arm_inout("za", "zt0") { bar(); }

where only "za" needs preserving across the call, but not "zt0". This could be done using explicit allocas, calling the __arm_tpidr2_* routines, but this functionality may become more involved in the future should more state be added to PSTATE.ZA.

In a similar vein, how about making __arm_sme_restore decide what to do based only on the SAVED_* fields, rather than passing OPTIONS again? The caller of the save and restore functions is expected to be the same, so it should know what needs to be preserved.

The reasoning for that was that one might only want to restore part of the saved state at one point, and restore the other part at another point. The compiler or user writing asm could use these routines to save/restore any state, e.g.

void bar();
void use_zt0() __arm_inout("zt0");
void use_za() __arm_inout("za");

void foo() __arm_inout("za", "zt0") {
               // preserve both zt0 and za, could use the __arm_sme_save routine for this.
  bar();
               // reload only zt0
  use_zt0();
               // save only zt0 (za was already saved)
  bar();
               // restore za
  use_za();
               // restore zt0
}

I wonder how useful the OPTIONS will be in practice, given that __arm_sme_save must leave PSTATE.ZA in a dormant or off state. It seems unlikely that a caller would know enough about its position in the call hierarchy/overall program to know that certain state doesn't need to be preserved, but at the same time not know enough for it to have a specific __arm_in/out/inout/preserves annotation. No objection to keeping OPTIONS if you prefer though.

My reasons for adding OPTIONS was to give an interface that allows the routines to be used more generically, e.g. to support implementing the following case more easily using these intrinsics:
void bar() __arm_inout("zt0");
void foo() __arm_inout("za", "zt0") { bar(); }
where only "za" needs preserving across the call, but not "zt0". This could be done using explicit allocas, calling the __arm_tpidr2_* routines, but this functionality may become more involved in the future should more state be added to PSTATE.ZA.

I don't think the new routines work for that case though. The routines rely on saving ZA lazily, whereas the above needs to save ZA eagerly. bar is entitled to assume that ZA is active rather than dormant on entry.

I wonder how useful the OPTIONS will be in practice, given that __arm_sme_save must leave PSTATE.ZA in a dormant or off state. It seems unlikely that a caller would know enough about its position in the call hierarchy/overall program to know that certain state doesn't need to be preserved, but at the same time not know enough for it to have a specific __arm_in/out/inout/preserves annotation. No objection to keeping OPTIONS if you prefer though.

My reasons for adding OPTIONS was to give an interface that allows the routines to be used more generically, e.g. to support implementing the following case more easily using these intrinsics:
void bar() __arm_inout("zt0");
void foo() __arm_inout("za", "zt0") { bar(); }
where only "za" needs preserving across the call, but not "zt0". This could be done using explicit allocas, calling the __arm_tpidr2_* routines, but this functionality may become more involved in the future should more state be added to PSTATE.ZA.
I don't think the new routines work for that case though. The routines rely on saving ZA lazily, whereas the above needs to save ZA eagerly. bar is entitled to assume that ZA is active rather than dormant on entry.

Yes, that was the reason I initially added the "lazily" as an option to the struct. I figured it should still be possible to set up the lazy-save using these routines and then commit the lazy-save manually before the call to bar and then restore afterwards, if ZA needs saving eagerly. Or is there some subtlety that I'm missing?

Yes, that was the reason I initially added the "lazily" as an option to the struct. I figured it should still be possible to set up the lazy-save using these routines and then commit the lazy-save manually before the call to bar and then restore afterwards, if ZA needs saving eagerly. Or is there some subtlety that I'm missing?

The difference for me is that the new routines are fundamentally about processing the live ZA state (and any future SME state) such that it's safe to call a normal function. In some cases this will involve turning PSTATE.ZA off. This is different from what the mixed-sharing example needs: there the intention is explicitly to keep ZA on (and active).

The new routines are also about providing an interface that can handle unknown future state. Thus the assumption is that extra state must be stored unless explicitly turned off (via OPTIONS). This is different from the example of moving between different sharing types, since there the compiler (by necessity) knows the state that is live in the caller and the state that is shared with the callee. The compiler only wants to store the difference between the two, and not any future unknown state.

When the callee shares some ZA state, but shares less state than the caller, the caller has to do an eager save and restore of the remaining state. I think we should just treat this as being conceptually like any other caller save. Admittedly the save and restore are quite heavy operations for ZA, but in concept they're not much different from a register save and restore. Treating them like that also helps with your later example involving calls to separate __arm_inout("zt0") and __arm_inout("za") functions.

ARM-software / abi-aa

[SME] Add ZA-compatible interface and routines to save/restore SME state #264