Microsoft's CHERIoT-ibex is optimized to be small and so uses a 33-bit memory bus. As a result, capability fetches take two bus cycles (and the two out-of-band bits are ANDed together to form the capability tag, reminiscent of IBM's System/38.)
The current capability layout does not allow for much parallel processing or decompression in the 2nd of those two bus cycles: the cursor/address field is not particularly useful unto itself and the bounds metadata fields are not particularly meaningful without some of the cursor/address bits.
A long while ago (how time flies), Kunyan and I sketched a layout in which we shuffle the bits around a bit, such that the two words shake out containing...
the reserved bit, the 4-bit exponent field, the 9-bit T field, the 9-bit B field, and the middle 9 bits of the address (that is, the ones that overlap the shifted T and B fields in decoding).
the 6-bit permissions, the 3-bit otype, and the residual 23 top and bottom bits of the address (the exponent also serves to divide the 23 bits into top and bottom).
This gives all the information needed for decompression in the first word, so that logic can happen in parallel with the 2nd bus cycle.
Microsoft's CHERIoT-ibex is optimized to be small and so uses a 33-bit memory bus. As a result, capability fetches take two bus cycles (and the two out-of-band bits are ANDed together to form the capability tag, reminiscent of IBM's System/38.)
The current capability layout does not allow for much parallel processing or decompression in the 2nd of those two bus cycles: the cursor/address field is not particularly useful unto itself and the bounds metadata fields are not particularly meaningful without some of the cursor/address bits.
A long while ago (how time flies), Kunyan and I sketched a layout in which we shuffle the bits around a bit, such that the two words shake out containing...
exponent
field, the 9-bitT
field, the 9-bitB
field, and the middle 9 bits of the address (that is, the ones that overlap the shiftedT
andB
fields in decoding).exponent
also serves to divide the 23 bits into top and bottom).This gives all the information needed for decompression in the first word, so that logic can happen in parallel with the 2nd bus cycle.
The non-contiguousness of the address bits is exciting and strongly differentiates
CGetLow
fromCGetAddr
. See https://github.com/CTSRD-CHERI/cheri-specification/wiki/Tracking-discussion-for-CGetAddr-vs-CToPtr-vs-as-integer-alias-vs-CGetLow