Document when tags are preserved when copying memory

kevin-brodsky-arm commented 3 years ago

By and large, the current approach used by CHERI LLVM is to preserve capability tags when copying memory if it is valid for the source to contain capabilities. While it is clearly the case in some situations (e.g. during a struct assignment if the struct contains capability types), this is not obvious in general.

It would be very helpful for the guide to document:

What C/C++ objects may hold capabilities from a compiler point of view (considering types, alignment, etc.).
Consequently, when capability tags are preserved during any form of copy (explicit memcpy(), struct assignment, implicit copy constructor, etc.).

rwatson commented 3 years ago

This sounds good; three further thoughts:

We should also define an API for non-capability-preserving memcpy(), possibly memcpy_nocap(), and its expected behaviour. We should explicitly identify some use cases -- such as when we intentionally don't want to preserve tags (such as in copyin()/copyout()-style use cases).
We should iterate through the standard C library APIs, and likely also POSIX APIs, and identify memcpy() synonyms/wrappers, and indicate for each whether they are expected to preserve tags. For example, we might define that strcpy() doesn't preserve tags, but that memmove() and sort() do (subject to suitable alignment/etc).
Where there is some ambiguity or the compiler may have to do different things to get useful access to optimisations, etc., we should also identify those. It's not clear to me what the scope of these cases is, but the impact may be more clear: the surprising stripping of tags, the surprising preservation of tags, and worse, behaviour that depends on optimisation level. I think it is fine (necessary?) for such cases to exist, but we should constrain them as much as makes sense.

rwatson commented 3 years ago

A further note from the meeting earlier today: We should also be documenting whether memory-mapping APIs produce tag-enabled mappings, whether by default or as a result of additional flags/arguments/etc. For example, we probably want tags enabled for MAP_ANON mappings by default with mmap(2) (as is the case today), but System V shared memory mappings should not (but we probably want an option/flag to enable it).

rwatson commented 3 years ago

Tagging @bsdjhb @brooksdavis @arichardson @jrtc27.

brooksdavis commented 3 years ago

Attemping to answer one part of the question: memcpy and memmove must preserve tags any time they copy sizeof(void * __capability) bytes where the source and destination are aligned. E.g., this needs to work:

struct s {
    uint64_t a;
    uint64_t b;
    void * __capability c;
};

void init_from_other(struct s *dst, struct s *src)
{
    memcpy(&dst->b, &src->b, sizeof(struct s) - offsetof(struct s, b));
}

One could imagine a more restricted C implementation (e.g. with strict sub-object bounds) that didn't preserve tags with unaligned starts, but for existing systems code this probably must work.

I think I've convinced myself that *sort need only preserve tags for objects aligned to sizeof(void * __capability) and which are a multiple of sizeof(void * __capability) bytes, but IIRC we preserve as with memcpy today so you can do absurd things if you really want to.

sbaranga-arm commented 3 years ago

A lot of memcpy calls are emitted by the compiler (e.g. for assignments) and those would copy the entire object. For these cases it would make sense to emit a call to a memcpy variant that doesn't preserve tags on unaligned starts.

brooksdavis commented 3 years ago

@arichardson has done some work looking at compiler generated copies in the context of improved inlining (https://github.com/CTSRD-CHERI/llvm-project/pull/506). We probably do want many of them to be tag-clearing, but de-facto C requires copying in all sorts of awkward places. For example:

struct s {
    uint64_t a;
    uint64_t b;
    char c[16];
} __attribute__((aligned(16)));

requires a tag-preserving memcpy because we can't know what's actually being stored in c since the C language can't differentiate between a string and a bag of bytes. (We likely want an annotation to say a string is actually a string or to push for a byte type as I believe is being discussed). One could implement a C dialect that restricted tag preservation further, but the cost of adaptation would start to climb so I believe it would need to be optional.

arichardson commented 3 years ago

We could try to make this distinction for C++20 (or maybe it's 17) code by only treating std::byte as potentially tag-bearing and assuming that char is actually a string. However, I feel this could be rather risky and it's safer to assume that all of signed char/unsigned char/char/std::byte can potentially hold tags.

kevin-brodsky-arm commented 3 years ago

As long as a char* is allowed to alias any other pointer (which I presume is still the case in C++17/20), I think we should preserve the assumption that an array of char may hold capabilities, because otherwise it feels like the departure from C/C++ is too great and it could break quite a lot of software. Of course having an optional compiler flag to remove that assumption wouldn't hurt either.

CTSRD-CHERI / cheri-c-programming

Document when tags are preserved when copying memory #12