CRRLUP or CSetBoundsRoundUp

davidchisnall commented 1 month ago

The CRRL instruction is intended for allocators. It finds the smallest allocation size that can be accurately bounded to give at least the requested length.

There is an analogous opposite requirement that has shown up in a few places: provide the largest bound that can be expressed up to a limit. This is used, for example, in the TLS stack. The use case is that the caller has a ring buffer and wants to pass a view of a region of it to the callee. This must be a precisely bounded region to avoid leaking data / data corruption (depending on whether it's a read or write call). The caller wants to minimise the number of calls, but can split the ring buffer into arbitrary chunks.

A CRRLUP instruction would need to take two source operands, one for the base address and one for the length, and give the smallest length that can correctly bound the capability.

The CRRLUP instruction could possibly use a cat emoji as its mnemonic.

Alternatively, a CSetBoundsRoundUp, which always sets the base to the current address and the length to the longest value that is possible to precisely bound up to the requested length, would work.

nwf commented 1 month ago

Some additional information: Zcheripurecap does not offer CRRL, with the rationale that CRRL(x) == (x + ~CRAM(x)) & CRAM(x) (and indeed, the in-sail implementation is defined using this identity and qemu's CRRL asserts that it holds). Peter Rugg also suggested that CRRL(x) = cheri_length_get(cheri_bounds_set(NULL, x)) might also work, but that hasn't been SMT checked for all cases.

CRRLUP is, as seen, easily enough done in software for CHERIoT's encoding (which has an external exponent and so doesn't switch mantissa widths) but it relies on knowledge of the encoding, and that's something we've tried to push into the architecture as much as possible. I don't think it's easily derived from CRAM or CSetBounds.

Purely bike-shedding, I think I would prefer the mnemonic CSetBoundsUpTo or ...RoundDown, rather than ...RoundUp, because the current CSetBounds (as opposed to CSetBoundsExact)[^standard-flip] rounds the length up to representation, ensuring that at least all requested bytes are accessible (well, subject to then clipping by the authority's length, to ensure monotonicity). (Similarly, I'm not sure CRRLUP conveys the right intuition.)

[^standard-flip]: Note that Zcheripurecap flips the presumption of exactness, with SCBNDS replacing CHERI-RISC-V ISAv9's CSetBoundsExact and SCBNDSR replacing CSetBounds.

davidchisnall commented 1 month ago

Note that Zcheripurecap flips the presumption of exactness, with SCBNDS replacing CHERI-RISC-V ISAv9's CSetBoundsExact and SCBNDSR replacing CSetBounds

I think this is the right choice. We originally had only the inexact version, but I found that we were very often adding a branch afterwards to check, so added the exact one. In hindsight, it should have been the default.

We made the same choice in the C++ wrappers. foo.bounds() = length does exact CSetBoundsExact, foo.bounds().set_inexact(length) does CSetBounds.

CHERIoT-Platform / cheriot-sail

CRRLUP or CSetBoundsRoundUp #72