Privilege escalation if a capability's bounds include the base address of a malloc'd block

ltratt commented 3 years ago

If one has a capability that includes the base pointer returned by malloc, one can convince realloc to upgrade a less privileged capability. The first proof-of-concept is this very simple program which takes a capability with narrow bounds and "tricks" realloc to upgrade it to a capability with wider bounds:

#include <assert.h>
#include <cheriintrin.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

#if !defined(__CHERI_PURE_CAPABILITY__)
#  error This example must be run on a CHERI purecap system
#endif

// On CheriBSD, if a capability's bounds include the base pointer to a
// `malloc`d block we can use `realloc` to launder a narrow capability into a
// wider one. In other words if:
//   1. malloc returns a capability C1 to a block 0..n bytes long
//   2. we create a capability C2 with bounds 0..m where m < n
//   3. realloc allows us to launder C2 back into C1

int main() {
    // malloc returns a capability C1 to a block 0..n bytes long
    uint8_t *arr = malloc(16);

    // Create a capability C2 with bounds 0..m where m < n
    arr = cheri_bounds_set(arr, 8);
    assert(cheri_tag_get(arr) && cheri_length_get(arr) == 8);

    // realloc allows us to launder C2 back into C1
    arr = realloc(arr, 16);
    assert(cheri_tag_get(arr) && cheri_length_get(arr) == 16);
}

We can strengthen this attack to turn a read-only capability into a read-write capability:

#include <assert.h>
#include <cheriintrin.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

#if !defined(__CHERI_PURE_CAPABILITY__)
#  error This example must be run on a CHERI purecap system
#endif

// On CheriBSD, if a capability's bounds include the base pointer to a
// `malloc`d block we can use `realloc` to launder a lower-priviliged
// capability into a higher-priviliged capability one. In other words if:
//   1. malloc returns a capability C1 to a block 0..n bytes long
//   2. we create a capability C2 with: bounds 0..m where m < n; and with write
//      privileges turned off
//   3. realloc allows us to launder C2 back into C1

int main() {
    // malloc returns a capability C1 to a block 0..n bytes long
    uint8_t *arr = malloc(16);
    assert(cheri_perms_get(arr) & (CHERI_PERM_LOAD | CHERI_PERM_STORE));

    // Create a capability C2 with bounds 0..m where m < n
    arr = cheri_bounds_set(arr, 8);
    assert(cheri_tag_get(arr) && cheri_length_get(arr) == 8);
    // Make C2 read-only.
    arr = cheri_perms_and(arr, CHERI_PERM_LOAD);
    assert((cheri_perms_get(arr) & CHERI_PERM_STORE) == 0);

    // realloc allows us to launder C2 back into C1
    arr = realloc(arr, 16);
    assert(cheri_tag_get(arr) && cheri_length_get(arr) == 16);
    assert(cheri_perms_get(arr) & CHERI_PERM_STORE);
}

The root problem is that realloc doesn't fully validate the input capability. Exactly what "fully validate" means is perhaps an open question. My naive suggestion is that it might mean "realloc will only accept a capability exactly equal to that returned by the last malloc/realloc for this base pointer", but that might be too restrictive, and one might need a notion of "compatible capabilities" or something else entirely...

jrtc27 commented 3 years ago

There's a more general problem that our jemalloc isn't fully hardened against this kind of thing; you can free capabilities you shouldn't be able to. I think this is just one particular symptom of that known issue. We don't view jemalloc as being a particularly good allocator for CHERI so haven't invested time in hardening it to malicious users (whether deliberate or not).

jrtc27 commented 3 years ago

See #342 for example; the only hardening we have is to check that it's tagged and offset zero.

brooksdavis commented 3 years ago

The current jemalloc implementation does not make any serious attempts to address an adversary that steps outside the C language by using intrinsics to modify pointer permissions. The dlmalloc in the caprevoke branch does store the original capability which allows it to require the the original capability is passed. I'm not sure if we actually do the check in realloc or only in free.

Hardening jemalloc is more complicated because there isn't an obvious place to store the original capability.

brooksdavis commented 3 years ago

Stepping back a bit, I think there are three sensible responses to a realloc caller requesting non-monotonic behavior:

Fail (return NULL)
Abort/fault
Force copying semantics (make realloc behave like malloc+free)

I’d tend to lean toward the latter two options because realloc failures are rarely handled properly by existing code and the semantics are a bit awkward (leaking ptr is sufficiently common that FreeBSD added reallocf). Abort/fault has the feel of a faulting ISA (e.g., CHERI-RISC-V) which is fine by me, but some people object strongly to the notation of library functions aborting. I find forcing copying semantics interesting as pragmatic choice.

The tricky thing for all of these is figuring out when to trigger the behavior. For permissions (#2) it’s easy enough to imaging malloc knowing the set of permissions for a new allocation and checking the permissions of the passed capability. For length it’s complicated because there are three lengths involved:

The length of the capability bounds
The length of the underlying allocation (the allocator know this)
The length of the original bounds (This might be less than (2). Lightly modified allocators likely doesn’t know this.) The problem is that we need to compare (1) and (3). One approach might be to force copying when (1) != (2) and copy at most (1) bytes. This might interact oddly with sub-object bounds.

One way to implement this is to store a copy of the original, malloced capability as our modified dlmalloc does. This isn't always easy and isn't space efficient, but allows any comparisons you might want. Another potential solution is the anti-tamper seals discussed briefly in the Experimental Features and Instructions section of CHERI ISAv8. The latter would almost certainly require compiler changes to avoid optimization related issues.

ltratt commented 3 years ago

Let me state my bias up-front: I am a fan of "abort on error" as a general principle so option 2 works for me!

I think the copying semantics you describe opens up a really fun attack. Imagine something like:

void f() {
  void *arr = malloc(16);
  pass_to_attacker(cheri_set_bounds(arr, 8));
}

void pass_to_attacker(void *arr2) {
  arr2 = realloc(arr2, <something big>);
  arr = malloc(16);
}

It's quite possible (in many mallocs, even "likely") that the malloc in pass_to_attacker will then use the same base pointer as in the original arr malloc. In other words, pass_to_attacker has got a decent chance of getting a capability pointing to the same 16 bytes of memory as f, despite being passed a capability with a bounds of 8.

As a side note, the example above also works if the attacker can simply free(arr2). So that suggests that free probably has to do the same checks as realloc.

jrtc27 commented 3 years ago

Yeah, I agree, I don't think you can do anything sensible other than abort or return NULL, and the latter is likely to just crash anyway in a more obscure manner. If you do the copying approach then the old pointer is now freed so will eventually get revoked and the real owner of the allocation will get a surprise that it (legitimately) is not prepared to handle (since it never gave out the full capability and thus should be able to assume its allocation isn't going away unless it says so). That or you don't have revocation and then you have a nice use-after-free attack you can conduct.

rwatson commented 3 years ago

I think abort on error is sensible, not least because we don’t know what idiomatic permission-aware CHERI C code looks like or wants to do yet. The check I suspect we want is a run-time assertion that the permissions (possibly other things) on a pointer returned by realloc() never exceed those of the pointer passed in. If people writing permission-aware CHERI C code start finding this to be a problem, then we can decide if we want other semantics -- or perhaps a new API with those other semantics rather than overloading them onto realloc(). It strikes me that we need to more clearly describe not just the spatial safety properties of realloc(), but also the temporal ones, clearly in a document.

rwatson commented 3 years ago

On the topic of “other things”, most likely realloc() and free() should both reject capabilities that are not “ordinary” data capabilities -- i.e., they should assert the tag, assert unsealed, and perhaps also assert that there are no surprising permissions that malloc() itself would never return to a caller (VMMAN?).

rwatson commented 3 years ago

(And when it comes to “specifying” (APIs) rather than “implementing” (code), I am not 100% certain which of the things I’ve described above should be specified behaviour vs. implementation choices. I suspect much should be specified, as we think this is part of the set of security properties we need, rather than simply robustness of the implementation against its own bugs.)

ltratt commented 3 years ago

I think it might be useful to distinguish "equal" from "compatible" capabilities in this regard (my terms! I don't know if the capability literature defines similar/different terms).

I think we probably all agree that one way of solving the problem is to say that realloc and free only succeed if the user passes in a capability that is precisely equal to the last capability returned by malloc or realloc. What I think @rwatson is alluding to is whether one can weaken this to allowing in a capability compatible with the capability most recently returned by malloc or realloc.

The challenge then becomes defining what "compatible" means. One reason I'm a little nervous about it is because future CHERI implementations could define extra/different permissions that mean it's impossible to be robust against future changes -- those sort of problems keep me awake at night (but I am a light sleeper!). That makes me think that the it might be best to start with "capabilities must be equal", see what breaks, and if necessary fall back to "capabilities can be equal"?

jacobbramley commented 3 years ago

Notably, for Morello, we probably want to ignore the "Flags" field (bits 63:56) in that comparison.

jrtc27 commented 3 years ago

I disagree, not least because those are part of the value you see when doing pointer comparison or casting to an integer and so things already will break today if you pass something with different flags because that is not the same pointer, even if it describes the same address range etc.

C says the pointer has to "match" one returned by malloc/calloc/realloc. Different flags definitely don't match in my book, and I interpret match in the context of CHERI as "must be the same pointer as" (which, even for non-CHERI C, doesn't mean "compares equal to").

jrtc27 commented 3 years ago

That is, I view Morello's flags field as analogous to the top byte tag bits in AArch64, which is specified as:

When tagged addressing is enabled, a tag is part of a pointer’s value for the purposes of pointer arithmetic. The result of subtracting or comparing two pointers with different tags is unspecified.

jacobbramley commented 3 years ago

Oh, good point, and that behaviour is quite important in the presence of MTE. It probably does need to be highlighted if we define what we need by "compatible" somewhere because the flags are otherwise ignored for Morello.

rwatson commented 3 years ago

Also on ‘equal’ vs ‘compatible’. Sub-object bounds complicate this because it becomes a reasonably likely event that bounds passed to free() are not those originally allocated. Sub-object bounds complicate quite a few things, but they offer some real security advantage -- e.g., when taking a pointer to an array within a structure, which is then overflowed -- so it would be nice to find semantics for the various APIs that tolerates it, to the greatest extent possible.

PeterSewell commented 3 years ago

On Thu, 19 Aug 2021 at 10:50, Robert N. M. Watson @.***> wrote:

Also on ‘equal’ vs ‘compatible’. Sub-object bounds complicate this because it becomes a reasonably likely event that bounds passed to free() are not those originally allocated.

really - why?

Sub-object bounds complicate quite a few things, but they offer some real security advantage -- e.g., when taking a pointer to an array within a structure, which is then overflowed -- so it would be nice to find semantics for the various APIs that tolerates it, to the greatest extent possible.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CTSRD-CHERI/cheribsd/issues/1065#issuecomment-901773902, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFMZZVXCDFECH2PMW63CA3T5THXLANCNFSM5CMPVNDA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

jacobbramley commented 3 years ago

Regarding permissions, there are use-cases where a caller might want to strip permissions, discarding the original so that undesirable accesses are provably impossible (without outside help at least). For example:

Allocate, write some data, then discard STORE permissions entirely to lock the data.
Allocate, but drop STORE_CAP (etc) to guarantee that the memory can never be used to extend a compartment when used as a communications channel.

I favour the strict approach (either aborting or returning NULL) but with a strict implementation of malloc/realloc/free, use-cases like those above become difficult to implement, and impose significant complexity on user code. I think there's room for a new, CHERI-specific allocation API here, though it's hard, at this stage, to consider everything that it might need to support. I at least foresee a use for something akin to mprotect, so that free can accept a capability with fewer permissions than malloc originally provided.

jrtc27 commented 3 years ago

On Thu, 19 Aug 2021 at 10:50, Robert N. M. Watson @.***> wrote: Also on ‘equal’ vs ‘compatible’. Sub-object bounds complicate this because it becomes a reasonably likely event that bounds passed to free() are not those originally allocated. really - why? …

I think Robert's point is that, without subobject bounds enabled in your compiler, the only way, other than pointer arithmetic (or UB things like copying to an unaligned address and clearing the tag), to get a capability that is derived from the allocation, points to the first byte of the allocation but does not compare exactly equal to the capability handed out from the allocator is by using various intrinsics. However, if subobject bounds are enabled, then you implicitly have various CSetBounds instructions inserted for you by the compiler just by performing address-of or array decay operations, and so programs that are a bit lax with their handling of pointers (e.g. struct foo { struct hdr h; int x; } ... bar(&foo->h); ... free(p); works on existing architectures, and in CHERI C with subobject bounds not generated for that address-of, but would not work if the capability were bounded to sizeof(struct HDR)).

jrtc27 commented 3 years ago

Regarding permissions, there are use-cases where a caller might want to strip permissions, discarding the original so that undesirable accesses are provably impossible (without outside help at least). For example:

Allocate, write some data, then discard STORE permissions entirely to lock the data.

Allocate, but drop STORE_CAP (etc) to guarantee that the memory can never be used to extend a compartment when used as a communications channel.

I favour the strict approach (either aborting or returning NULL) but with a strict implementation of malloc/realloc/free, use-cases like those above become difficult to implement, and impose significant complexity on user code. I think there's room for a new, CHERI-specific allocation API here, though it's hard, at this stage, to consider everything that it might need to support. I at least foresee a use for something akin to mprotect, so that free can accept a capability with fewer permissions than malloc originally provided.

You'd need a way to revoke the store-bearing capabilities as they may well have been spilled to the stack behind your back and a read of uninitialised stack memory could alias one. Maybe stack temporal memory safety would be sufficient to ensure you can never read those, but I wouldn't like to say there isn't another way to leak and later be able to retrieve the original capability within the same trust boundary.

brooksdavis commented 3 years ago

One idea is I had for an alternative API would be an allocator where you get both a capability and a token that you an use to free it (today I'd probably just seal the capability with a type allocated to the allocator) so you'd get an api like:

void *allocate(size_t, allocation_handle_t *);
void release(allocation_handle_t);

I do see @jrtc27's point that it's hard to provide strong assurances that the higher privilege capability is really gone.

ltratt commented 3 years ago

For fun, I've put together a running example that shows that free alone is enough to recover a capability. This is clearly more fragile than using realloc, but this example runs successfully for me on Morello and RISC-V CheriBSD:

#include <assert.h>
#include <cheriintrin.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

#if !defined(__CHERI_PURE_CAPABILITY__)
#  error This example must be run on a CHERI purecap system
#endif

// On CheriBSD, if a capability's bounds include the base pointer to a
// `malloc`d block we can use `free` to recover the original capability. This
// is inherently fragile, and relies on the underlying malloc reusing memory
// (which CheriBSD's jemalloc currently does).

int main() {
    // malloc returns a capability C1 to a block 0..n bytes long
    uint8_t *c1 = malloc(16);
    // Separate out the pointer from the capability so that we can check it
    // later.
    vaddr_t c1_addr = cheri_address_get(c1);

    // Create a capability C2 with bounds 0..m where m < n
    uint8_t *c2 = cheri_bounds_set(c1, 8);
    c1 = NULL; // Be clear that we've lost access to C1.
    assert(cheri_tag_get(c2) && cheri_length_get(c2) == 8);

    // We first free C2...
    free(c2);
    // ...and then immediately allocate a block the same size as C1.
    uint8_t *c3 = malloc(16);
    // We get back a capability C3 that is identical to C1.
    assert(cheri_tag_get(c3) && cheri_length_get(c3) == 16);
    assert(cheri_address_get(c3) == c1_addr);
}

brooksdavis commented 3 years ago

FWIW, if you make the bounds line:

uint8_t *c2 = cheri_bounds_set(c1 +8, 8);

you almost certainly corrupt the allocator state. I'm not convinced it's practical to defend jemalloc against this.

ltratt commented 3 years ago

Yes, this attack certainly exploits jemalloc's approach to memory allocation.

One possibility would be to swap jemalloc for something like OpenBSD's malloc which goes out of its way to randomise things. It wouldn't make this attack theoretically impossible, but in practise you'd need the patience of a saint to make it work.

nwf commented 3 years ago

On the topic of permissions for free, @LawrenceEsswood's CheriOS occupied a particularly interesting point in the design space: free took an address, rather than a capability, on the grounds that full spatial and temporal safety meant that freeing everything in sight arbitrarily was, at worst, a DoS on the system. In order to provide a measure of availability in the face of such a stark position, CheriOS's malloc offered a reference-counted, attributed claim/release system which could be used to defer frees until all claims were released (or all claimants had exited). I don't know whether this necessarily sheds light on whether a more traditional malloc, without refcounts, should require capabilities for free or not, but it's an interesting extremal position to consider.

On a differently experimental note, these kinds of amplification attacks have been in scope for Cornucopia and its successors. We generally defend against them by not re-issuing address space until the freed object has had all non-TCB capabilities to it (i.e., held outside the kernel and memory allocator) revoked. Our implementations to date are imperfect, but we believe captures the majority of the costs that would be present in a hypothetically perfect implementation. You might want to take a look at https://github.com/CTSRD-CHERI/cheri-exercises/tree/master/src/exercises/pointer-revocation and its bigger brother https://github.com/CTSRD-CHERI/cheri-exercises/tree/master/src/missions/use-after-free-control-flow .

jrtc27 commented 3 years ago

Yes, this attack certainly exploits jemalloc's approach to memory allocation.

One possibility would be to swap jemalloc for something like OpenBSD's malloc which goes out of its way to randomise things. It wouldn't make this attack theoretically impossible, but in practise you'd need the patience of a saint to make it work.

Randomising is a hack, and not a solution (for example, CheriBSD turns ASLR off for pure-capability binaries, since our threat model doesn't regard addresses as needing to be secret), especially in a world where you have a single malloc shared by multiple compartments. Experience says that attackers eventually find ways to exfiltrate the information they need or influence the randomness. If you're going to use a different malloc, use one that's designed for CHERI; e.g. snmalloc is trying to be (re)written with CHERI in mind.

rwatson commented 3 years ago

Probably just to state something known to everyone on the three, but worth stating for the purposes of future readers: use-after-free is not currently within the threat model of the baseline CheriBSD jemalloc implementation, which provides spatial and not temporal safety. @brettferdosi may be able to comment on the applicability of his temporal safety wrapper, which uses Cornucopia and successors, and can wrap jemalloc, however?

CTSRD-CHERI / cheribsd

Privilege escalation if a capability's bounds include the base address of a malloc'd block #1065