Open ruben-arm opened 4 years ago
Yes, this sounds good. What immediately come to mind are memory allocation functions, such as malloc()
, realloc()
, and so on. Do you have other things in mind as well?
It would probably also be desirable to document expectations on other bounds, permissions, etc., such as for locals, globals, alloca()
, etc. (although presumably we should have another issue for this)
Do you have other things in mind as well?
I think it would be useful to categorise the functions which return pointers into:
strtok
, strstr
)?It would probably also be desirable to document expectations on other bounds, permissions, etc., such as for locals, globals, alloca(), etc. (although presumably we should have another issue for this)
That would be nice. In particular, argv
might be worth specifying.
It's not doing exactly the same thing - provenance not capabilities - but one of our WG14 colleagues has produced a diff to the standard to add provenance (just per-allocation, no subobject provenance yet). It might be worth looking at all the places he had to touch...
Peter
On Thu, 23 Jul 2020 at 09:55, Robert N. M. Watson notifications@github.com wrote:
Yes, this sounds good. What immediately come to mind are memory allocation functions, such as malloc(), realloc(), and so on. Do you have other things in mind as well?
It would probably also be desirable to document expectations on other bounds, permissions, etc., such as for locals, globals, alloca(), etc.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CTSRD-CHERI/cheri-c-programming/issues/9#issuecomment-662894194, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFMZZUS466CPVVRXIIKRYDR473I3ANCNFSM4PFQFLQA .
Another case is functions that work with 0-terminated strings, like strncpy
: should we amend the bounds of the returned capability?
The malloc
family of allocation functions returns a bounded capability to an allocation at least as large as requested, with a base of 0 and a length less than or equal to the length of the allocation. For precisely representable sizes, the allocation may be larger than the bounds, otherwise they will likely be the same, but that's up to the allocator.
Things other than malloc
(and internals like dl_iterate_phdr
) that alter bounds are rare. The one exception in CheriBSD's libc is the rather weird BSD interface fgetln
where we do set (potentially imprecise) bounds on the returned pointer to the internal stream buffer.
We likewise don't do much with permissions except that you can't use the malloc
family to allocate an executable capability and we strip the CHERI_PERM_CHERIABI_VMMAP user permission from allocations so you can't use anything in the mmap
family to alter the pages later (we also use this to identify capabilities that can be revoked in our temporal safety work.)
It may well be the case that there are places where we want to do things like enforce read-only access to things in otherwise writable pages (e.g. stuff in string space), but so far we've not really explored that space.
Outside the C standard library, but somewhat in libc, the mmap
family of APIs have undergone a number of quite subtle changes. We've got some minor updates in progress on the CheriBSD side around address hints and then I should probably sit down and update the manpages again to reflect the ways the API has changed.
Experience this far suggests that consumers of interfaces such as sort()
, strpos()
, and strtok()
sometimes expect the returned string pointer to continue to allow access to other bits of the containing string, rather than simply being pointers to a shorter string. On the whole we've therefore opted not to set bounds in those APIs, after a few misfires. A more complete study might show that it's feasible and useful to do so. (E.g., I can imagine that if it were safe with parsing functions, it would be quite beneficial?)
It's worth remembering that any time you explicitly set bounds, you will likely need to get back to the original pointer later so you need the equivalent of a free
to malloc
for whatever function you're calling. (fgetln
is odd because the lifetime is implicit).
With string functions, you certainly can't assume that bytes beyond the NUL
won't be accessed later.
Thinking aloud: asprintf() might be an interesting case. Depending on the implementation you might end up with a somewhat oversized allocation and want to slap a bound on it. I'm not sure how that would play out in practice though.
It's sometimes useful to set bounds internally as a debugging aid/assertion, but you need to take care that they don't leak out.
I think it is safe to say that we have a pretty good understanding of bounds behaviour at the granularity of the allocation, but that experiments with sub-object bounds generally prove more challenging: structure fields, array entries, etc. In the case of structure fields, we have experimental results that are useful regarding the use of containerof
patterns. For array entries, we've generally found that software (as with the language) is quite sloppy regarding whether a pointer is to the array itself, or just to an element of an array. The more conservative route, simply doing allocation bounds, seems a sensible route, and larger-scale experiments with structure sub-object bounds are proving productive (e.g., with our pure-capability kernel).
(But, to be clear, we've not reviewed POSIX (for example) for other opportunities in any detail. Perhaps there are cases where arrays are indexed and a pointer returned, and yet the caller should not be able to get back to the array -- such as strerror()
-- however, I'm not sure how much actual utility will be found in restricting those cases.)
It might be useful to describe requirements for values taken/returned by C standard library (bounds, permissions, etc.)