checkedc / checkedc-llvm-project

This was a fork of Checked C clang used from 2021-2024. The changes have been merged into the original Checked C clang repo, which is now at https://github.com/checkedc/checkedc-clang.
https://www.checkedc.org
13 stars 19 forks source link

Check return value bounds #1149

Closed secure-sw-dev-bot closed 2 years ago

secure-sw-dev-bot commented 2 years ago

This issue was copied from https://github.com/microsoft/checkedc-clang/issues/1150


This PR checks that the inferred bounds of a return value expression imply the declared return bounds (if any) for the enclosing function.

This PR does not check modifications to variables or other lvalue expressions that are used in return bounds. For example:

_Array_ptr<int> f(int size) : count(size) {
  size = 3; // modify the size variable used in the declared return bounds
  return 0;
}

At the assignment size = 3, the bounds checker will not emit any diagnostic messages even though the modified variable size is used in the declared return bounds. This can be done in a separate PR.

Test updates:

  1. Disable two 3C tests: functionDeclEnd.c and itype_nt_arr_cast due to functions returning expressions with invalid bounds (see issue #1147).
  2. CheckedC/dump-dataflow-facts.c: two functions returned expressions with invalid bounds. These functions now return 0.
  3. CheckedC/static-checking/bounds-decl-checking.c: four functions returned expressions with invalid bounds. These functions now return expressions with valid bounds.
  4. CheckedC/static-checking/return-bounds.c: add a new test file that tests the return bounds checking behavior introduced in this PR.
  5. checkedc/465: add expected errors to thirteen function in checkedc tests that return expressions with invalid bounds.
  6. checkedc-llvm-test-suite/114: update one function in the LLVM test suite that returned an expression with unknown bounds.
secure-sw-dev-bot commented 2 years ago

Comment from @mattmccutchen-cci:

Is it intentional that this change applies to return values with bounds-safe interfaces? Consider the following example:

_Array_ptr<char> test(void) : count(1) {
  char *x;
  return x;
}

char *test_itype(void) : count(1) {
  char *x;
  return x;
}

Both functions compile without error on master. With this PR, both functions generate errors:

itype_return.c:3:10: error: return value has unknown bounds, bounds expected because the function 'test' has bounds
  return x;
  ~~~~~~~^
itype_return.c:1:18: note: (expanded) declared return bounds are 'bounds(_Return_value, _Return_value + 1)'
_Array_ptr<char> test(void) : count(1) {
                 ^
itype_return.c:8:10: error: return value has unknown bounds, bounds expected because the function 'test_itype' has bounds
  return x;
  ~~~~~~~^
itype_return.c:6:7: note: (expanded) declared return bounds are 'bounds(_Return_value, _Return_value + 1)'
char *test_itype(void) : count(1) {
      ^

For test, this makes sense: the lack of bounds checking at a function return was inconsistent with all other contexts, which IIUC is the bug being fixed in this PR. However, for test_itype, the error seems to violate this statement from section 6.3.8 of the specification:

The checking of return statements (Section 4.9) includes return statements where there is a return bounds-safe interface for the enclosing function:

  • In checked contexts.
  • In unchecked contexts, when the return statement expression has been converted implicitly to an unchecked pointer type.

Since x already has an unchecked pointer type, it does not get an implicit conversion inserted per section 6.3.7, so the return statement does not meet the condition to be included in bounds checking.

3C has been assuming that a function whose return type has a bounds-safe interface can return an arbitrary unchecked pointer, so if that design is changing, we'll need to find a way to adjust the design of 3C correspondingly. That is, in essence, why the itype_nt_arr_cast.c test is failing. In functionDeclEnd.c, bounds checking of return values is not the focus of the test, so if need be, we can probably find a way to change that test to avoid triggering the new error.

secure-sw-dev-bot commented 2 years ago

Comment from @kkjeer:

The behavior in this PR (regarding bounds checking unchecked pointers in unchecked contexts with bounds-safe interfaces) is intentional with respect to the latest Checked C discussion as far as I understand. @dtarditi @sulekhark should we change this behavior so it is more consistent with the 3C design?

secure-sw-dev-bot commented 2 years ago

Comment from @mattmccutchen-cci:

Perhaps more to the point: if I'm right that the "latest Checked C discussion" as reflected by this PR is no longer consistent with the specification, can you please update the specification or otherwise point us to where we can get the latest information?

secure-sw-dev-bot commented 2 years ago

Comment from @sulekhark:

Perhaps more to the point: if I'm right that the "latest Checked C discussion" as reflected by this PR is no longer consistent with the specification, can you please update the specification or otherwise point us to where we can get the latest information?

@mattmccutchen-cci
As you pointed out, Section 6.3.8 (middle of page 116)

The checking of return statements (Section 4.9) includes return statements where there is return bounds-safe interface for the enclosing function: In checked contexts, In unchecked contexts, when the return statement expression has been converted implicitly to an unchecked pointer type

While our current implementation is consistent with Section 4.9, I think this part of the spec (Section 6.3.8 - middle of page 116) is unclear and we need to modify the spec and add one more scenario to the above statement as shown below. This will make this part of the spec consistent with the previous part.

In unchecked contexts, when an unchecked pointer is returned

However, we will discuss this matter once again with David tomorrow, and resolve it (i.e. make the spec and the implementation consistent with each other).

secure-sw-dev-bot commented 2 years ago

Comment from @mattmccutchen-cci:

I guess I'm still unclear on the overall design intent of bounds-safe interfaces. Clearly, one important use case is to allow both unmodified C code and fully Checked C code to call external library functions that have bounds-safe interfaces (as in the Checked C system headers). Is it also a goal to be able to add bounds-safe interfaces to plain C functions in the user's code without having to modify those implementations right away? This is essentially what 3C is trying to do.

If the answer is indeed "no", I noticed two other parts of the spec that could use clarification. In Section 6.3.1:

A function that has a bounds-safe interface and whose body does not use checked pointer or array types is compiled as though the bounds-safe interface has been stripped from its source code.

That sentence appears in a discussion of what happens when unchecked code uses a program element with a bounds-safe interface. But read in isolation, it does seem to suggest that my test_itype example above should be allowed.

Also, an assignment like the following is already rejected by the compiler before this PR (similar to the restriction this PR is adding for return values):

char *global : count(1);

void assign_global(void) {
  char *x;
  global = x;
}

But Section 6.3.8 says:

The checking that an expression statement, declaration, or bundled block implies the declared bounds of variables includes:

  • In checked contexts, all variables with bounds-safe interfaces.
  • In unchecked contexts, all variables with bounds-safe interfaces that are modified by an assignment within the statement, declaration, or bundled block, where the right-hand side expression is converted implicitly from a checked pointer type to an unchecked pointer type.

Does another scenario for an unchecked right-hand side need to be added here as well?

secure-sw-dev-bot commented 2 years ago

Comment from @mwhicks1:

We've been discussing this issue over at Correct Computation and the summary is:

One thing I find confusing is that itypes currently used for standard library functions could not be applied to those functions themselves, in their original C. For example, in the header string_checked.h I have

char *strtok(char * restrict s1 : itype(restrict _Nt_array_ptr<char>),
             const char * restrict s2 : itype(restrict _Nt_array_ptr<const char>)) :
  itype(_Nt_array_ptr<char>);

The standard library was compiled from C code that implements this function, e.g.,

char *strtok(char * restrict s, const char * restrict sep) {
 {
    static char *p;
    if (!s && !(s = p)) return NULL;
    s += strspn(s, sep);
    if (!*s) return p = 0;
    p = s + strcspn(s, sep);
    if (*p) *p++ = 0;
    else p = 0;
    return s;
} 

I would have thought that if I can advertise in string_checked.h that this code as an itype, I could just as well have attached the itype directly to the code itself, i.e.,

char *strtok(char * restrict s : itype(restrict _Nt_array_ptr<char>),
             const char * restrict sep : itype(restrict _Nt_array_ptr<const char>)) :
  itype(_Nt_array_ptr<char>) {
    static char *p;
    if (!s && !(s = p)) return NULL;
    s += strspn(s, sep);
    if (!*s) return p = 0;
    p = s + strcspn(s, sep);
    if (*p) *p++ = 0;
    else p = 0;
    return s;
}

It feels strange to me that this does not work, but typechecking this code with its original C type and the prototype also present with an itype does work. What principle justifies the difference in treatment?

secure-sw-dev-bot commented 2 years ago

Comment from @sulekhark:

As per the discussion we had with David today, we agree that if the itypes used in a function declaration are applied to the function definition, and the function definition is compiled in an unchecked scope, and the function body does not use any checked pointers, then the compiler must not issue any bounds-checking errors or warnings.

For the strtok code show below, compiling it with the master branch :

#include <string.h>
char *strtok(char * restrict s : itype(restrict _Nt_array_ptr<char>),
             const char * restrict sep : itype(restrict _Nt_array_ptr<const char>)) :
  itype(_Nt_array_ptr<char>) {
    static char *p;
    if (!s && !(s = p)) return NULL;
    s += strspn(s, sep);
    if (!*s) return p = 0;
    p = s + strcspn(s, sep);
    if (*p) *p++ = 0;
    else p = 0;
    return s;
}

currently gives two errors (I have elided the notes):

 t.c:7:14: error: inferred bounds for 's' are unknown after assignment
        if (!s && !(s = p)) return 0;

t.c:8:2: error: inferred bounds for 's' are unknown after assignment
        s += strspn(s, sep);

Going forward, these errors will be issued when the above function is analyzed in checked scope but not when analyzed in an unchecked scope. Additionally, in unchecked scope the return bounds checking (as per this PR) on the returned pointer s in the above example will not be performed.

We will update the Checked C spec and the compiler code to be consistent with this behavior.

secure-sw-dev-bot commented 2 years ago

Comment from @mwhicks1:

Thanks for the update. When the new spec is available we will reread with a mindset toward consistency/principle, to see if we have any further thoughts about itypes; if so, we'll comment on a separate thread.

secure-sw-dev-bot commented 2 years ago

Comment from @mattmccutchen-cci:

we agree that if the itypes used in a function declaration are applied to the function definition, and the function definition is compiled in an unchecked scope, and the function body does not use any checked pointers, then the compiler must not issue any bounds-checking errors or warnings

For the record: it looks like #1157, #1158, and #1159 have been filed for this work. Thanks!