CTSRD-CHERI / cheri-c-programming

CHERI C/C++ Programming Guide
28 stars 3 forks source link

Document when tags are preserved when copying memory #12

Open kevin-brodsky-arm opened 3 years ago

kevin-brodsky-arm commented 3 years ago

By and large, the current approach used by CHERI LLVM is to preserve capability tags when copying memory if it is valid for the source to contain capabilities. While it is clearly the case in some situations (e.g. during a struct assignment if the struct contains capability types), this is not obvious in general.

It would be very helpful for the guide to document:

rwatson commented 3 years ago

This sounds good; three further thoughts:

rwatson commented 3 years ago

A further note from the meeting earlier today: We should also be documenting whether memory-mapping APIs produce tag-enabled mappings, whether by default or as a result of additional flags/arguments/etc. For example, we probably want tags enabled for MAP_ANON mappings by default with mmap(2) (as is the case today), but System V shared memory mappings should not (but we probably want an option/flag to enable it).

rwatson commented 3 years ago

Tagging @bsdjhb @brooksdavis @arichardson @jrtc27.

brooksdavis commented 3 years ago

Attemping to answer one part of the question: memcpy and memmove must preserve tags any time they copy sizeof(void * __capability) bytes where the source and destination are aligned. E.g., this needs to work:

struct s {
    uint64_t a;
    uint64_t b;
    void * __capability c;
};

void init_from_other(struct s *dst, struct s *src)
{
    memcpy(&dst->b, &src->b, sizeof(struct s) - offsetof(struct s, b));
}

One could imagine a more restricted C implementation (e.g. with strict sub-object bounds) that didn't preserve tags with unaligned starts, but for existing systems code this probably must work.

I think I've convinced myself that *sort need only preserve tags for objects aligned to sizeof(void * __capability) and which are a multiple of sizeof(void * __capability) bytes, but IIRC we preserve as with memcpy today so you can do absurd things if you really want to.

sbaranga-arm commented 3 years ago

A lot of memcpy calls are emitted by the compiler (e.g. for assignments) and those would copy the entire object. For these cases it would make sense to emit a call to a memcpy variant that doesn't preserve tags on unaligned starts.

brooksdavis commented 3 years ago

@arichardson has done some work looking at compiler generated copies in the context of improved inlining (https://github.com/CTSRD-CHERI/llvm-project/pull/506). We probably do want many of them to be tag-clearing, but de-facto C requires copying in all sorts of awkward places. For example:

struct s {
    uint64_t a;
    uint64_t b;
    char c[16];
} __attribute__((aligned(16)));

requires a tag-preserving memcpy because we can't know what's actually being stored in c since the C language can't differentiate between a string and a bag of bytes. (We likely want an annotation to say a string is actually a string or to push for a byte type as I believe is being discussed). One could implement a C dialect that restricted tag preservation further, but the cost of adaptation would start to climb so I believe it would need to be optional.

arichardson commented 3 years ago

We could try to make this distinction for C++20 (or maybe it's 17) code by only treating std::byte as potentially tag-bearing and assuming that char is actually a string. However, I feel this could be rather risky and it's safer to assume that all of signed char/unsigned char/char/std::byte can potentially hold tags.

kevin-brodsky-arm commented 3 years ago

As long as a char* is allowed to alias any other pointer (which I presume is still the case in C++17/20), I think we should preserve the assumption that an array of char may hold capabilities, because otherwise it feels like the departure from C/C++ is too great and it could break quite a lot of software. Of course having an optional compiler flag to remove that assumption wouldn't hurt either.