Create backward sentries for function returns and add more checks in cjalr

vmurali commented 6 months ago

The following is based on discussions with @davidchisnall (portions of our email exchange have been included as is with perhaps slight modifications).

Background: A typical (non-malicious) function call is as follows:

caller saves callee-saved registers (tmp registers) in the caller stack
caller saves arguments in arg registers, and overflow arguments in the callee stack
caller stores the return address (i.e. address of the instruction next to the call) in the return-address register (or stack in x86)
caller jumps to the entry point of the callee function (stored in a register). (RISC-V stores return address and jumps to the address in the register via a single jalr instruction)
callee executes the function; it might spill registers in the stack as needed
callee jumps back to the caller using the return-address register (jr)

Even if the callee code is bug-free, the caller can violate callee’s CFI in two ways:

Jump to an address different from the entry point of the callee:
- For instance, if the callee’s code internally performs operations in a specific branch which the caller did not have access to originally (because of various checks by the callee), the caller can instead directly jump to that branch of the callee (bypassing the checks) and execute that privileged code. (Note that PMP or MMU will not protect against this attack as a caller that can execute a function will have access to execute any part of the function; RISC-V has, for instance, proposed a “landing pad” to protect against this)
Setting a return address different from the correct return address
- The caller can make a callee jump to an arbitrary location in some other library that the caller has access to. (This is the basis for Return Oriented Programming (ROP) attack, the returned location called a "gadget")
- The return address that is spilled in the stack in overwritten (either with a buffer overflow attack or directly by feeding a wrong return address), and such gadgets chained.

How CHERI prevents CFI:

Cross-compartment and Library functions can be jumped only through sealed capabilities (sentries)
- jalr instruction to sentry ensures that the offset is 0, so the caller can only jump to published entry point sentries that are accessible by the caller, not arbitrary code inside another library
- This prevents jumping to a privileged portion of the callee code bypassing the callee’s checks
- This also prevents most of the ROP attacks using gadgets, as gadgets must be published entry points
Return address can only be manipulated by the caller to contain one of the following:
- Sentry representing the correct return address
- Sentry that the caller has access to already (either a function call entry point that the caller has access to, or the return address sentry of its caller, or some other return address sentry the caller previously had access to)
- Non-sealed capability to its own code
(Note that in all the 3 cases above, the manipulated return address has no effect, because the caller could have just jumped to that return address in the code)

CHERIoT inherits all the above benefits. However, because CHERIoT allows manipulating the status of the interrupt through a function call (and function return), by encoding the interrupt type of the callee in the otype of the calling sentry and the interrupt type of the caller in the otype of the return-to-caller sentry, the following attack can occur: A caller calling an interrupt-disabling callee X can set the return address of the callee to be the same interrupt-disabling sentry to X. This means, the callee, on return will call itself all the while operating with interrupts disabled. This will lead to infinite repeated calls to X with interrupts disabled, violating availability.

This attack can be prevented in CHERIoT by adding two new "backwards-edge" sentries and adding more checks on cjalr:

otype 4: Backwards-edge, interrupts-disabled sentry otype 5: Backwards-edge, interrupts-enabled sentry.

cjalr with a non-cnull link register always creates otype 4 or 5, i.e. backward sentries. (Currently it always creates otype 2 or 3 (forward explicit interrupt control sentries), so this is just a slightly different constant, but in the same place).

cs1	cd	Used for	Valid otypes
cra	cnull	Function return	Return sentries only (4, 5)
not cra	cnull	Tail call to sentry	Unsealed or interrupt inheriting forward sentry (0, 1)
any	not cnull	Function call	Unsealed or interrupt inheriting forward sentry (0, 1)
any	cra	Function call	Unsealed or forward sentries (0,1,2,3)

(Table derived from @rmn30 's)

Note that this disallows tail calls optimization of functions that change interrupt state. (They’re currently disabled and there’s a FIXME in the compiler to enable them. The compiler change is now to remove the FIXME, and then document this for programmers.)

vmurali commented 6 months ago

@davidchisnall

rmn30 commented 6 months ago

It would be useful to add the following table to document valid arguments to cjalr for sentry calls:

cs1	cd	Used for	Valid sentry types
cra	cnull	Function return	Return sentries
not cra	cnull	Tail call to sentry	Interrupt inheriting forward sentry only
any	cra	Function call	Forward sentries
any	not cnull, not cra	Invalid	None

Unsealed destinations are not subject to these restrictions.

Writing this out has made me realise that at the moment your code permits use of ~~backwards~~ interrupt inheriting sentries in the case where cd is not null and not ra. @davidchisnall is that required or should we disallow it?

vmurali commented 6 months ago

It would be useful to add the following table to document valid arguments to cjalr for sentry calls:

cs1 cd Used for Valid sentry types cra cnull Function return Return sentries not cra cnull Tail call to sentry Interrupt inheriting forward sentry only any cra Function call Forward sentries any not cnull, not cra Invalid None Unsealed destinations are not subject to these restrictions.

Very nice!

Writing this out has made me realise that at the moment your code permits use of backwards sentries in the case where cd is not null and not ra. @davidchisnall is that required or should we disallow it?

Hmm, if cs1 is a backward sentry, then cd must be $cnull (the fourth check). Maybe your table doesn't capture it correctly?

davidchisnall commented 6 months ago

Writing this out has made me realise that at the moment your code permits use of backwards sentries in the case where cd is not null and not ra

I don't have very strong opinions here, but I'd lean towards disallowing it. The only use case for cjal[r] with a non-zero and non-link-register cd is in the code-compression things (e.g. spill lots of registers with a function call that uses ct0 as the link register). We don't currently use them, but we potentially could. If we did, we'd want to restrict everything that uses any link register to forward sentries or not sentries.

At some point, it would be good to have variants that must use sentries. For CHERIoT, we could use the unused register bit for this, but official RVI extensions cannot.

Given that sentries cannot use the offset field, it would be nice to use a 16-bit encoding for this.

rmn30 commented 6 months ago

Hmm, if cs1 is a backward sentry, then cd must be $cnull (the fourth check). Maybe your table doesn't capture it correctly?

Yes, I just realised that and updated the comment but there is still one case that slipped through: interrupt inheriting sentry with not null, not ra cd. Not sure whether this should be allowed or not.

rmn30 commented 6 months ago

At some point, it would be good to have variants that must use sentries. For CHERIoT, we could use the unused register bit for this, but official RVI extensions cannot.

Could require that cret (cs1=ra, cd = null) must use a return sentry and is not valid for unsealed cs1?

davidchisnall commented 6 months ago

Could require that cret (cs1=ra, cd = null) must use a return sentry and is not valid for unsealed cs1?

I don't believe there are any use cases that require cret to work with an unsealed thing, so I would be happy with that requirement. It prevents substituting an on-stack return address with an unsealed thing.

vmurali commented 6 months ago

Yes, I just realised that and updated the comment but there is still one case that slipped through: interrupt inheriting sentry with not null, not ra cd. Not sure whether this should be allowed or not.

Hmm, interrupt inheriting sentry should be like any forward sentry (i.e. has to have ra) or it can be null. We can force that.

vmurali commented 6 months ago

Could require that cret (cs1=ra, cd = null) must use a return sentry and is not valid for unsealed cs1?

I don't believe there are any use cases that require cret to work with an unsealed thing, so I would be happy with that requirement. It prevents substituting an on-stack return address with an unsealed thing.

👍

vmurali commented 6 months ago

Writing this out has made me realise that at the moment your code permits use of backwards sentries in the case where cd is not null and not ra

I don't have very strong opinions here, but I'd lean towards disallowing it. The only use case for cjal[r] with a non-zero and non-link-register cd is in the code-compression things (e.g. spill lots of registers with a function call that uses ct0 as the link register). We don't currently use them, but we potentially could. If we did, we'd want to restrict everything that uses any link register to forward sentries or not sentries.

At some point, it would be good to have variants that must use sentries. For CHERIoT, we could use the unused register bit for this, but official RVI extensions cannot.

Given that sentries cannot use the offset field, it would be nice to use a 16-bit encoding for this.

I want to say this discussion is moot because of an error in Robert's table. It's been fixed to say backward sentry requires $cnull link and $cra target.

vmurali commented 6 months ago

I really like the table that @rmn30 made; I will change the checks to match the table exactly (with appropriate negations in the beginning) and fix the other typos/issues.

rmn30 commented 6 months ago

I really like the table that @rmn30 made; I will change the checks to match the table exactly (with appropriate negations in the beginning) and fix the other typos/issues.

I think it is easier (and safer) to think about the allowed cases. Summarising the above conversation and extending the table to include unsealed we have:

cs1	cd	Used for	Valid otypes
cra	cnull	Function return	Return sentries only (4, 5)
not cra	cnull	Tail call to sentry	Unsealed or interrupt inheriting forward sentry (0, 1)
any	cra	Function call	Unsealed or forward sentries (0,1,2,3)
any	not cnull, not cra	Invalid	None

Note that the first row does not allow unsealed. I am not sure whether we should allow unsealed in the last row? Probably not.

cjal should produce a return sentry in link register.

vmurali commented 6 months ago

I really like the table that @rmn30 made; I will change the checks to match the table exactly (with appropriate negations in the beginning) and fix the other typos/issues.

I think it is easier (and safer) to think about the allowed cases. Summarising the above conversation and extending the table to include unsealed we have:

cs1 cd Used for Valid otypes cra cnull Function return Return sentries only (4, 5) not cra cnull Tail call to sentry Unsealed or interrupt inheriting forward sentry (0, 1) any cra Function call Unsealed or forward sentries (0,1,2,3) any not cnull, not cra Invalid None Note that the first row does not allow unsealed. I am not sure whether we should allow unsealed in the last row? Probably not.

cjal should produce a return sentry in link register.

This is exactly how I implemented what I just pushed. I didn't allow unsealed in the last row. cjal produces are return sentry. I am pasting this table in the commit message.

vmurali commented 6 months ago

@rmn30 I added explanations in the spec PDF

davidchisnall commented 6 months ago

This prevents us from doing the spill / reload optimisations as library functions with a custom link register. We aren’t currently doing that, but:

If you use CJAL (one copy of the spill functions per compartment), you can still return via CRA, so you can spill the old cra and then move from your custom link register to cra for the return.
If we made any non-cnull cd follow the same rules as linking to cra, we’d be able to do this with library functions too.

It would be nice if we could have a way of requiring that the target is a valid forward sentry (no unsealed). Currently, you can do auipcc and then arbitrary arithmetic to generate a thing that can be used as a function pointer and points to any code (or read-only data that happens to be valid code) in your compartment.

I’m not sure how much this matters since you can get arbitrary code execution as long as you start with arbitrary code execution, but this may let you go from a constrained set of gadgets to a rich set. I suspect anyone who can do this can also build a Turing-complete weird machine out of real function pointers, but I’m not sure.

Having a way of encoding this would let us move to an ABI where all function pointers came from the import table, without changing the ISA.

rmn30 commented 6 months ago

@davidchisnall are you suggesting that rows 3 and 4 of the table should be the same, permitting cjal to any unsealed cap or forward sentry with any link register? In that case we might as well drop the restriction on row 2 as well, because using the null link register is not much different than trashing one. In this case the only restriction would be that cret (aka cjalr $cnull, $cra or cjr $cra) only works with a return sentry and return sentries only work with cret. I think that would still prevent the attack by turning it into a trap on cret. It's a bit unfortunate that it gives an easy way to cause a trap in a library function but it shouldn't make any difference as it was about to return anyway.

vmurali commented 6 months ago

David's suggestion will not merge rows 3 and 4. It will require that Any forward sentry should have some non-null link address, rather than just $cra.

Row 2 is necessary to allow null link addresses generated only through inherited sentries, otherwise the semantics of structured interrupts becomes messy (the caller and return-target could be in different interrupt states).

davidchisnall commented 6 months ago

Row 2 is necessary to allow null link addresses generated only through inherited sentries, otherwise the semantics of structured interrupts becomes messy (the caller and return-target could be in different interrupt states).

It doesn't fully guarantee structured programming but it does at least mean any interrupt-disabled function can trust its immediate return, which I agree is a very valuable property. Interrupt-disabled leaf functions are guarantee to return to their caller, which means you can then reason about call sites one layer up, inductively.

rmn30 commented 6 months ago

it does at least mean any interrupt-disabled function can trust its immediate return

That makes sense. In that case we can't merge rows 3 and 4. Maybe we should make row 4 the same as 2: unsealed or interrupt inheriting only?

vmurali commented 6 months ago

On Wed, May 22, 2024, 10:20 Robert Norton @.***> wrote:

it does at least mean any interrupt-disabled function can trust its immediate return

That makes sense. In that case we can't merge rows 3 and 4. Maybe we should make row 4 the same as 2: unsealed or interrupt inheriting only?

My suggestion is to change CD of row 3 to non cnull and remove row 4 (anything not in the first three rows are invalid).

—

Reply to this email directly, view it on GitHub https://github.com/microsoft/cheriot-sail/pull/54#issuecomment-2125369924, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABZURN6KUTTBP3E4UU3E73ZDTHV3AVCNFSM6AAAAABIAEOC7CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRVGM3DSOJSGQ . You are receiving this because you were mentioned.Message ID: @.***>

vmurali commented 6 months ago

it does at least mean any interrupt-disabled function can trust its immediate return

That makes sense. In that case we can't merge rows 3 and 4. Maybe we should make row 4 the same as 2: unsealed or interrupt inheriting only?

Why allow interrupt inheriting (or unsealed) for Row 4? I would prefer to fail for any case other than when the first three rows are not met. Is there a use case for row 4 with unsealed or interrupt inheriting?

davidchisnall commented 6 months ago

The kind of code size reduction things I mentioned above.

vmurali commented 6 months ago

The kind of code size reduction things I mentioned above.

Sorry I don't get it. The new table I am proposing is as follows:

cs1	cd	Used for	Valid otypes
cra	cnull	Function return	Return sentries only (4, 5)
not cra	cnull	Tail call to sentry	Unsealed or interrupt inheriting forward sentry (0, 1)
any	not cnull	Function call	Unsealed or forward sentries (0,1,2,3)

Any other case is invalid.

davidchisnall commented 6 months ago

Currently, RISC-V uses JALR with a non-ra link register to do things like outlined prolog sequences. We don’t do this currently, but it would be nice to not prevent it. For this to work, we’d need cjalr with a non-cra link register to work with forward sentries.

vmurali commented 6 months ago

My new table allows that @davidchisnall

davidchisnall commented 6 months ago

I think that change loses us the guarantee that interrupt-disabled functions have a valid car to return to. I think we want only interrupt-inheriting sentries for link to not-cra.

vmurali commented 6 months ago

It still requires some register with a backwards sentry to return from. If you think there's no use case for a non cra returning function to change interrupt status, we can add that constraint

vmurali commented 6 months ago

Done. Updated the tables appropriately @davidchisnall @rmn30

vmurali commented 6 months ago

@microsoft-github-policy-service agree company="Google"

vmurali commented 6 months ago

I see that you did a merge commit. Isn't it better to do a squash merge. This gives a not helpful trail of commit messages

davidchisnall commented 6 months ago

Sorry, I should have checked the commit history.

CHERIoT-Platform / cheriot-sail

Create backward sentries for function returns and add more checks in cjalr #54