double-width load/store for performance

tariqkurd-repo commented 1 year ago

Given that CHERI doubles the width of the register file and that the upper half of the registers can be easily accessed using CGetHigh/CSetHigh has anyone considered having XLEN*2 width load/stores which access integer data but fill the whole register (clearing the tag)?

XLEN2 load/stores would be different to LQ/SQ which would notionally access a register-pair. Instead they access [XLEN2-1:0] in the capability register file.

They can be used to accelerate memcpy without the complication of looking like they might want to store tags, requiring the SW PTE permission, meaning that the OS may think that capabilities are stored there. They could execute misaligned as an extra advantage.

This seems like a neat trick to get more performance out of the upper half of the register file.

rwatson commented 1 year ago

I think we’ve always been leery of the opcode footprint :-).

In similar discussions in the past, though, the thinking has been that if you want to strip tags when implementing something memcpy()-like, for example, you’d remove the capability load permission from the base capability, so that you use the same instruction but get the untagged value. This is pretty reasonable for a memcpy()-like thing, where the cost of the register-to-register move will be negligible, and you don’t have to worry about squashing the original capability, but for something more granular that might not be desirable.

tariqkurd-repo commented 1 year ago

yes - stripping tags on the loads certainly works - but what about storing the data back? I'm not keen on checking the tag on a store, and setting the PTE CD bit only if a tag is actually set - as this is a data dependant exception. This would work better if we go for this suggestion https://github.com/CTSRD-CHERI/cheri-specification/issues/73 where store cap without store cap permission just stores a zero tag. Then the PTE CD bit doesn't need to be updated in a data dependant way - it just doesn't get updated if the permission is missing.

So the proposal would be:

CW CD Behavior 0 X Trap on all capability stores only if store_cap permission is set (exception code 0x1B) - do not check the tag when deciding whether to trap 1 0 Capability stores atomically raise CD or fault (as above) only if store_cap permission is set - do not check the tag when deciding whether to trap 1 1 Capability stores permitted regardless of store_cap permission and tag

I think that gives an elegant solution without the extra encoding space.

Additionally we can say that if the load_cap/store_cap permission isn't set then we can allow an implementation to relax the alignment constraints on load/store cap. So really the permissions switch them from load/store-double-width to load/store-cap. Then we meet all the objectives of my original request.

What do you think?

nwf commented 1 year ago

A long long time ago I proposed adding a mcisa CSR to RISC-V to let us experiment with data dependence on certain paths, including this one.

It remains an open question how much dirtying on any tag-capable store would hurt, for example, revocation's cap-dirty tracking. [0] While making stores through !permit-store-cap not tag-capable (and so, zeroing tags and not dirtying pages) could further inform or side-step the dependence, I suspect that in practice nearly every memcpy or memmove will be the tag-oblivious flavor that had us wanting the store dependence in the first place.

Also, I suspect there's mileage to be gotten by having the dependence on the value conditional on the TLB result; that is, if the translation comes back as being stores-permissive, there's no need to wait for the data before deciding that the instruction cannot raise a fault here. A plausible middle ground would be to treat tag-capable stores to ...

cap-clean pages (that is, CW=0) trap regardless of the tag,
cap-dirty pages (CW=1 CD=1) not trap, and
cap-dirtyable pages (CW=1 CD=0) dependently trap. But again we'd want to measure the impact before settling on such a thing.

[0] There are some other uses of CW/CD in the system, but not much that I think is essential; the primary consumer is revocation. And for revocation we'd really rather do something (anything!) else than have this be associated with virtual pages, because aliases are just a nightmare. But that's a big ask.

jrtc27 commented 1 year ago

cap-clean pages (that is, CW=0) trap regardless of the tag,

This immediately breaks memcpy'ing to an mmap'ed file

nwf commented 1 year ago

That assumes that we continue to give out permit-cap-store caps to mmap()ed files, doesn't it? If we're revising semantics here, that could perhaps be changed?

jrtc27 commented 1 year ago

There are many ways you can, and need to, end up with that. Partial over-mapping is one such example. Even if you restrict the file mapping to have PROT_*_CAP-less PROT_MAX, the bigger mapping may have PROT_WRITE_CAP in PROT_MAX, and thus Permit_Store_Cap in the capability permissions. Then you would fault if using the strictly-more-permissive capability as your authority, but not if you used the less-permissive one. That's extremely odd and surprising.

CTSRD-CHERI / cheri-specification

double-width load/store for performance #108