CTSRD-CHERI / cheri-c-programming

CHERI C/C++ Programming Guide
30 stars 3 forks source link

Requirements for standard library routines #9

Open ruben-arm opened 4 years ago

ruben-arm commented 4 years ago

It might be useful to describe requirements for values taken/returned by C standard library (bounds, permissions, etc.)

rwatson commented 4 years ago

Yes, this sounds good. What immediately come to mind are memory allocation functions, such as malloc(), realloc(), and so on. Do you have other things in mind as well?

It would probably also be desirable to document expectations on other bounds, permissions, etc., such as for locals, globals, alloca(), etc. (although presumably we should have another issue for this)

ruben-arm commented 4 years ago

Do you have other things in mind as well?

I think it would be useful to categorise the functions which return pointers into:

It would probably also be desirable to document expectations on other bounds, permissions, etc., such as for locals, globals, alloca(), etc. (although presumably we should have another issue for this)

That would be nice. In particular, argv might be worth specifying.

PeterSewell commented 4 years ago

It's not doing exactly the same thing - provenance not capabilities - but one of our WG14 colleagues has produced a diff to the standard to add provenance (just per-allocation, no subobject provenance yet). It might be worth looking at all the places he had to touch...

Peter

On Thu, 23 Jul 2020 at 09:55, Robert N. M. Watson notifications@github.com wrote:

Yes, this sounds good. What immediately come to mind are memory allocation functions, such as malloc(), realloc(), and so on. Do you have other things in mind as well?

It would probably also be desirable to document expectations on other bounds, permissions, etc., such as for locals, globals, alloca(), etc.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CTSRD-CHERI/cheri-c-programming/issues/9#issuecomment-662894194, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFMZZUS466CPVVRXIIKRYDR473I3ANCNFSM4PFQFLQA .

yury-khrustalev commented 4 years ago

Another case is functions that work with 0-terminated strings, like strncpy: should we amend the bounds of the returned capability?

brooksdavis commented 4 years ago

The malloc family of allocation functions returns a bounded capability to an allocation at least as large as requested, with a base of 0 and a length less than or equal to the length of the allocation. For precisely representable sizes, the allocation may be larger than the bounds, otherwise they will likely be the same, but that's up to the allocator.

Things other than malloc (and internals like dl_iterate_phdr) that alter bounds are rare. The one exception in CheriBSD's libc is the rather weird BSD interface fgetln where we do set (potentially imprecise) bounds on the returned pointer to the internal stream buffer.

We likewise don't do much with permissions except that you can't use the malloc family to allocate an executable capability and we strip the CHERI_PERM_CHERIABI_VMMAP user permission from allocations so you can't use anything in the mmap family to alter the pages later (we also use this to identify capabilities that can be revoked in our temporal safety work.)

It may well be the case that there are places where we want to do things like enforce read-only access to things in otherwise writable pages (e.g. stuff in string space), but so far we've not really explored that space.

brooksdavis commented 4 years ago

Outside the C standard library, but somewhat in libc, the mmap family of APIs have undergone a number of quite subtle changes. We've got some minor updates in progress on the CheriBSD side around address hints and then I should probably sit down and update the manpages again to reflect the ways the API has changed.

rwatson commented 4 years ago

Experience this far suggests that consumers of interfaces such as sort(), strpos(), and strtok() sometimes expect the returned string pointer to continue to allow access to other bits of the containing string, rather than simply being pointers to a shorter string. On the whole we've therefore opted not to set bounds in those APIs, after a few misfires. A more complete study might show that it's feasible and useful to do so. (E.g., I can imagine that if it were safe with parsing functions, it would be quite beneficial?)

brooksdavis commented 4 years ago

It's worth remembering that any time you explicitly set bounds, you will likely need to get back to the original pointer later so you need the equivalent of a free to malloc for whatever function you're calling. (fgetln is odd because the lifetime is implicit).

With string functions, you certainly can't assume that bytes beyond the NUL won't be accessed later.

Thinking aloud: asprintf() might be an interesting case. Depending on the implementation you might end up with a somewhat oversized allocation and want to slap a bound on it. I'm not sure how that would play out in practice though.

It's sometimes useful to set bounds internally as a debugging aid/assertion, but you need to take care that they don't leak out.

rwatson commented 4 years ago

I think it is safe to say that we have a pretty good understanding of bounds behaviour at the granularity of the allocation, but that experiments with sub-object bounds generally prove more challenging: structure fields, array entries, etc. In the case of structure fields, we have experimental results that are useful regarding the use of containerof patterns. For array entries, we've generally found that software (as with the language) is quite sloppy regarding whether a pointer is to the array itself, or just to an element of an array. The more conservative route, simply doing allocation bounds, seems a sensible route, and larger-scale experiments with structure sub-object bounds are proving productive (e.g., with our pure-capability kernel).

rwatson commented 4 years ago

(But, to be clear, we've not reviewed POSIX (for example) for other opportunities in any detail. Perhaps there are cases where arrays are indexed and a pointer returned, and yet the caller should not be able to get back to the array -- such as strerror() -- however, I'm not sure how much actual utility will be found in restricting those cases.)

rwatson commented 4 years ago

10 created to address C-language definitions and their bounds/permissions, as distinct from standard library APIs.