checkedc / checkedc-llvm-project

This was a fork of Checked C clang used from 2021-2024. The changes have been merged into the original Checked C clang repo, which is now at https://github.com/checkedc/checkedc-clang.
https://www.checkedc.org
13 stars 19 forks source link

Introduce temporaries for argument expressions that have side-effects and are used in parameter or return bounds #1009

Open secure-sw-dev-bot opened 2 years ago

secure-sw-dev-bot commented 2 years ago

This issue was copied from https://github.com/microsoft/checkedc-clang/issues/1009


I noticed that the original C source code of one of CCI's 3C benchmark programs (icecast) builds successfully with the Checked C compiler, but after I update the #include directives to reference the checked headers, a Checked C compile error occurs. Here's a simplified test case. The following program compiles:

#include <stdlib.h>

void test() {
  int len = 5;
  char *p = malloc(++len);
}

But if I change the first line to #include <stdlib_checked.h>, then I get the following error:

malloc.c:5:20: error: increment expression not allowed in argument for parameter used in function return bounds expression
  char *p = malloc(++len);
                   ^~~~~

I believe it's a design goal that the Checked C compiler should accept plain C code even when the checked headers are included so that code can be converted gradually even within a single file. If so, the above behavior violates that design goal, and it will become a bigger problem with implicit inclusion of checked headers (microsoft/checkedc#440 et al.), hence my interest in reporting this issue now.

There may be other bugs of this nature. If I can spare the time, I'll run the Checked C compiler on all of our benchmarks after updating the #includes but before adding any actual Checked C annotations, since if that fails, implicit header inclusion would block our workflow unless we opt out of it.

secure-sw-dev-bot commented 2 years ago

Comment from @mattmccutchen-cci:

I ran our benchmarks and saw only one other type of error, with a call instead of an increment:

#include <stdlib_checked.h>
#include <string_checked.h>

const char msg[] = "hello";

void test() {
  char *p = malloc(strlen(msg));
}
malloc2.c:7:20: error: call expression not allowed in argument for parameter used in function return bounds expression
  char *p = malloc(strlen(msg));
                   ^~~~~~~~~~~

We have examples of this with various functions in the Checked C headers, including malloc, realloc, memset, memcpy, and memmove.

secure-sw-dev-bot commented 2 years ago

Comment from @dtarditi:

Yes, this is an issue with the implementation of checking of bounds declarations. We're trying to compute the bounds of the value returned by the call, at the location of the call. To do this, we substitute the actual parameters for the formal parameters in the declared return bounds for the function. We don't allow modifying expressions (calls or other side-effecting expressions) in the bounds expression, so we issue an error.

We do a similar substitution for parameters to check that actual parameters have the bounds declared by the function. We substitute the actual parameters into the declared bounds for the parameter, and then check that the bounds of the actual parameter expression imply the declared bounds (after substitution). We don't hit this issue for parameters because we skip the check if the actual parameter has an unchecked pointer type and there isn't a compiler-introduced bounds-safe interface cast for the parameter (that is, the call does not pass an actual parameter with a checked pointer type to a formal parameter with an unchecked pointer type).

Unfortunately, this approach doesn't work for return values from calls. We don't know whether the bounds for the return value are needed. Computing the "is needed" information would be a fair amount of work, and would be difficult to explain formally.

There are two approaches that could be taken to address this issue.

secure-sw-dev-bot commented 2 years ago

Comment from @dtarditi:

The return value check is in CallExprBounds in SemaBounds.cpp around line 5983.

secure-sw-dev-bot commented 2 years ago

Comment from @mattmccutchen-cci:

A short-term solution is to not an issue an error message when a parameter with side-effects is substituted into a return bounds. We could instead mark the return bounds as bounds(unknown).

That sounds to me like the right short-term fix to unblock inclusion of checked headers by default.

This could make an error message confusing because it wouldn't be clear why the bounds became unknown.

Could you add a field to the relevant data structure to store the source location of the function call that generated the bounds(unknown), so that when a bounds proof fails, you can attach a note diagnostic with the location of the bounds(unknown) to the error diagnostic about the proof failure? I'd hope this approach would both provide a decent user experience and be reasonably straightforward to implement.

secure-sw-dev-bot commented 2 years ago

Comment from @sulekhark:

The short-term solution as explained above is implemented. Leaving this issue open in order to track the correct long-term solution.

secure-sw-dev-bot commented 2 years ago

Comment from @mattmccutchen-cci:

Thank you! We will test the interim fix in our environment ASAP.

secure-sw-dev-bot commented 2 years ago

Comment from @mattmccutchen-cci:

Side note: We may eventually need to add logic to 3C that actually adds similar temporaries to the user's source code in support of certain refactorings. So when you get around to adding the temporaries internally to the type checker, it might be worth discussing whether that logic could be written in a way that 3C could reuse at least some of it.

secure-sw-dev-bot commented 2 years ago

Comment from @sulekhark:

@mattmccutchen-cci Sure, we'll keep that in mind.