Add a CRelocate instruction to accelerate purecap linkage

jrtc27 commented 2 years ago

Currently position independent objects are quite inefficient to load as, whilst the offsets and lengths for capabilities to non-preemptible symbols are known at link time, the base and address are not, since they will need the base address at which the object is loaded to be added to them at run time, and we have no good way to adjust capabilities like that, you have to start with the authority and re-derive it piece by piece.

CRelocate would take a capability-shaped bag of bits and add an integer to the base, address and top, leaving its permissions, length, offset etc all the same, just sliding it up. Sail for CHERI-RISC-V would be something like:

function clause execute (CRelocate(cd, cs1, rs2)) = {
  let cs1_val = C(cs1);
  let rs2_val = X(rs2);
  let newBase = getCapBaseBits(cs1_val) + rs2_val;
  let newTop = getCapTopBits(cs1_val) + EXTZ(rs2_val);
  let newAddr = cs1_val.address + rs2_val;
  if cs1_val.tag then {
    handle_cheri_reg_exception(CapEx_TagViolation, cs1);
    RETIRE_FAIL
  } else {
    let (exact, newCapBounds) = setCapBounds(cs1_val, newBase, newTop);
    let (exactWithAddr, newCap) = setCapAddr(newCapBounds, newAddr);
    if not (exact) then {
      handle_cheri_reg_exception(CapEx_InexactBounds, cs1);
      RETIRE_FAIL
    } else {
      assert(exactWithAddr, "Offset should always be representble within relocated bounds?");
      C(cd) = newCap;
      RETIRE_SUCCESS
    }
  }
}

It is unclear however what the semantics should be in a tag-clearing world, since the input and output already have their tag clear. Allowing CRelocate of a valid capability would be fine to tag clear on, but the case where the requested bounds are not representable is the awkward one, as there is no good signalling mechanism beyond doing something with the bounds to make them representable.

jonwoodruff commented 2 years ago

In discussion this morning, we had an alternative proposal.

The first operand could be a tagged capability with the desired address, and the second operand could contain an untagged capability to convey the length and permissions.

The destination would have the address of the first operand and the length and permissions from the second operand.

This would remove the need for CBuildCap at the end.

Micro-architecturally, this would be a variant of CSetBounds that takes the bounds from capability encoding (and CAndPerms with perms formatted in the capability encoding).

One could either consume the bounds and permissions from an untagged capability register loaded as a capability, or could possibly consume the operands from an integer register that contains capability metadata. For micro-architecture, the former should be more convenient, since we would be largely treating this as a capability.

nwf commented 2 years ago

I'm not sure why you'd want the address from the authority rather than using that as the slide?

Spit-balling a bit to see if it sticks, how about allowing the bag of bits to have its own base, offset, and length? This way the loader could, for example, have a RW authority to an executable's .data section (with base at its load address and offset 0) and the bags of bits could have symbols' offsets within that section as their base and lengths as, well, lengths. (I don't imagine there are symbols with nonzero offsets relative to their own base address, but maybe?) The result would have to be entirely within the authority's span to be tagged.

In any case, attempting to sail off into the sunset with that idea, as it were, I think we can adjust @jrtc27's code to be...

function clause execute (CRelocate(cd, ca /* authority */, cb /* bag of bits */)) = {
  let ca_val = C(ca);
  let cb_val = C(cb);

  let newBase = getCapAddr(ca_val) + getCapBaseBits(cb_val);
  /* In the dynamic loader's use case, getCapAddr(ca_val) == ca_val.base ? */

  let length = toBits(sizeof(xlen), getCapLength(cb_val));

  let newAddr = newBase + getCapOffsetBits(cb_val);
  /* In the dynamic loader's use case, getCapOffsetBits(cb_val) == 0? */

  if ca_val.tag & isCapSealed(ca_val) {
    handle_cheri_reg_exception(CapEx_SealViolation, ca);
    RETIRE_FAIL
  } else if not (inCapBounds(ca_val, newBase, unsigned(length)) {
    handle_cheri_reg_exception(CapEx_LengthViolation, ca);
    RETIRE_FAIL
  } else {
    let (exact, newCapBounds) = setCapBounds(ca_val, newBase, newBase + length);
    let (exactWithAddr, newCap) = setCapAddr(newCapBounds, newAddr);
    if not (exact) then {
      handle_cheri_reg_exception(CapEx_InexactBounds, cs1);
      RETIRE_FAIL
    } else {
      assert(exactWithAddr, "Offset should always be representable within relocated bounds?");
      C(cd) = setCapPerms(newCap, getCapPerms(ca_val) & getCapPerms(cb_val));
      RETIRE_SUCCESS
    }
  }
}

An interesting question, perhaps, is what we would do in the CHERI+MTE case. Presumably we'd generate a tagged result only if (either (ca_val and cb_val have identical monochromatic colors) or (ca_val is polychromatic))?

jrtc27 commented 2 years ago

The point of pushing the +-like operator to the authority rather than the bag of bits is you can leave the bag of bits's bounds alone and thus don't have the question of what to do with them if the +-like operator's result isn't representable in a tag-clearing world, since the bag of bits already has its tag clear. I assume, at least.

jrtc27 commented 2 years ago

The downside of that though is in a multi-root situation (because you have RW and RX caps in the run-time linker) you either need a relocation per root cap (which means branching) or multiple + operations for each relocation (one per root).

jonwoodruff commented 2 years ago

I didn't quite follow all the issues, but I can summarise here what I understood, in case that's helpful.

My proposal allowed us to eliminate the CBuildCap at the end of a relocation, but did not include the "slide+offset" in the relocate instruction; also, only the bounds and permissions are used from the "bag of bits" (BoB?). Presumably we want to encode our entire relocations as a 128-bit BoB capability, so not having the actual offset in the BoB is suboptimal.

Wes suggests we encode the offset (with respect to the section) as the base of the BoB, as well as encoding the length and permissions. This seems ok with me, though micro-architecturally, getting the base is more painful than just the address. I'll let others judge if it's distasteful architecturally, but micro-architecturally it would be nice to say that the address of the BoB will be the offset in the section; the length will be the length; the permissions anded with the permissions of the authority. In all likelihood, the offset of the BoB would be zero, so this wouldn't matter, but it could be weird if not.

jrtc27 commented 2 years ago

The offset in the section is almost never zero (and the run-time linker deals with segments, not sections, where that's even less true... plus we amalgamate segments into one RW cap and one RX cap so it's worse than even that). The offset from the symbol is often 0, however (but recall that functions aren't tightly bounded so will ~always have a non-zero offset from their base, the start of the RX cap; plus even in a tightly-bounded function world, C++ exceptions make use of capabilities pointing into the middles of functions that still cover the whole function).

jonwoodruff commented 2 years ago

So that is to say, we want the BoB to supply (for the symbol cap) length, permissions, location in section (or segment? or something?) as the base, and offset as the offset?

The authorising capability would provide the section (or segment? or something?) address and bounds encompassing the final bounds of the symbol cap.

This should be reasonable as a 2-cycle instruction, given that we'll need to "getBase" and then add that to address, and further SetBounds. In an out-of-order processor, and where you're doing a long, independent chain of these things, I guess it shouldn't matter too much.

rwatson commented 1 year ago

Just slightly reviving this issue in the issue tracker as, with discussions about nailing down encodings for CHERI-RISC-V becoming more prominent, this instruction represents a use case where software is expected to know and embed a specific encoding in a persistent manner impacting forward/backward software compatibility.

Are there any design choices we could make here to manage the impact of encoding change -- e.g., does having a CRelocate basically imply that we need a versioned capability format, and some scheme to advertise the capability format(s) of the current processor via (for example) a control register so that software can potentially handle multiple versions in some way?

(I’ll observe for completeness that there are, of course, other pertinent use cases for software embedding of the capability format, including debugging tools, as well as cases where regardless of specific software embedding, the format becomes part of the ABI -- e.g., process or VM migration, distributed shared memory, and so on.)

CTSRD-CHERI / cheri-specification

Add a CRelocate instruction to accelerate purecap linkage #3