correctcomputation / checkedc-clang

This is the primary development repository for 3C, a tool for automatically converting legacy C code to the Checked C extension of C, which aims to enforce spatial memory safety. This repository is a fork of Checked C's.
14 stars 5 forks source link

3C should not exit when encountering conflicting declarations #283

Closed sroy4899 closed 3 years ago

sroy4899 commented 4 years ago

The following two files: file1.c:

typedef long unsigned int size_t;
extern char *strerror_r (int __errnum, char *__buf, size_t __buflen)
     __attribute__ ((__nothrow__ )) __attribute__ ((__nonnull__ (2))) ; 

and file2.c:

typedef long unsigned int size_t; 
extern int strerror_r (int __errnum, char *__buf, size_t __buflen) __asm__ ("" "__xpg_strerror_r") __attribute__ ((__nothrow__ )) __attribute__ ((__nonnull__ (2))); 

cause the following crash, only when file1 comes before file2:

0  cconv-standalone         0x0000000104266bca llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 58
1  cconv-standalone         0x0000000104267139 PrintStackTraceSignalHandler(void*) + 25
2  cconv-standalone         0x000000010426503b llvm::sys::RunSignalHandlers() + 123
3  cconv-standalone         0x000000010426b18f SignalHandler(int) + 207
4  libsystem_platform.dylib 0x00007fff6d4e65fd _sigtramp + 29
5  libsystem_platform.dylib 0x00007fc50cb09a58 _sigtramp + 18446743822980494456
6  cconv-standalone         0x00000001046fe4b8 FunctionVariableConstraint::mergeDeclaration(ConstraintVariable*, ProgramInfo&) + 216
....

Interactive debugging revealed that the issue is in PVC's mergeDeclaration's assumption that the number of atoms match:

 PVConstraint *From = dyn_cast<PVConstraint>(FromCV);
  std::vector<Atom *> NewVatoms;
  CAtoms CFrom = From->getCvars();
  CAtoms::iterator I = vars.begin();
  CAtoms::iterator J = CFrom.begin();
  while (I != vars.end()) {
    Atom *IAt = *I;
    Atom *JAt = *J;                                           //FAILS HERE

This is likely due to incompatible types (char * vs int)

mwhicks1 commented 4 years ago

This example reveals something important: Multi-file programs may not type check. That is, file1.c and file2.c have conflicting types for the same symbol, but the compiler will not notice this because they are in different files. Just compiling them as clang -c file1.c file2.c produces no error. But if you concatenate them (cat file1.c file2.c > foo.c && clang -c foo.c) you will get a type error.

This means that cconv-standalone needs to be more defensive when dealing with merging prototypes, and indeed it should flag an error when prototypes do not match up in the way you'd expect if they were declared in the same source file. We should look at other ways that mergeDecl and brainTransplant and perhaps other related routines could fail, with this in mind.

mwhicks1 commented 4 years ago

I thought about some possible solutions to this problem.

1) Have the ConstraintVariable::mergeDeclaration and ConstraintVariable::brainTransplant functions check that the two declaration/definitions being combined are compatible, and abort if not.

Incompatibility could be signaled via the return of these functions; right now, they return `void` but could return a `bool`. Then the ultimate caller of these functions, `ProgramInfo::insertIntoExternalFunctionMap`, can check the return code to signal an error in the source program and terminate the analysis. The error message can indicate where the incompatible declarations are. We might need to extend the `ConstraintVariable` class to include location information to make this useful. Right now, for example, the `FVConstraint` has a field with filename, but that's it. 

Alternatively, the program could just die within these functions when the discrepancy is found. But since it might be found recursively within (e.g., with a function parameter), it may lack the context to signal the error.

2) We could try to recover from incompatibility and carry on.

For example, if we discover that one prototype has parameter `int *x` and the other has `int x`, then these two cannot be merged. So we just leave the original alone and constrain it to WILD. The problem with this particular approach is that a failure is likely to come up later. E.g., the version that has `int *x` may have callers that pass in pointers, but if we retain the `int x` version, then the constraints generated between caller and callee will fail.

Another approach would be to have a file-local notion of prototype, e.g., treating the prototype as if it were for a `static` function. The reason this could work is that the C type checker will have confirmed that the prototype is type-consistent within the file. So if `f1.c` thinks the prototype is `int *foo(void)` and `f2.c` thinks its `int foo(void)` then callers local to those files will be consistent with the local view; i.e., no caller in `f1.c` will think it is getting just an `int` back.

To do this, we would need to detect the disagreement between prototypes and the split them on the fly, somehow. This seems hard to me. Moreover, it seems counterproductive. This is a latent bug in the program we are converting, so we should find it and flag it, not paper over it.

I think we should do (1).

[Edit by Matt: Add four leading spaces to a paragraph to put it in the same numbered list item, at least in GitHub's version of Markdown.]

mattmccutchen-cci commented 4 years ago

To do this, we would need to detect the disagreement between prototypes and the split them on the fly, somehow. This seems hard to me.

Yes, this seems to me to be fundamentally the wrong thing to try to do in a whole-program analysis.

sroy4899 commented 4 years ago

It seems Mike's suggestion is the way forward that I will likely pursue and is similar to my own idea. The usage of a boolean return to encode information from mergeDeclaration is something that I think will be particularly useful. I think this is a step-up from my naive solution to "dummy"-merge the two declarations and then failing at the first sight of conflict, but I think it would be better if we instead detected things that we would normally use an assertion for (e.g. lengths are the same, types are the same), but instead of an assertion error, return false and rely on a function with location information.

mattmccutchen-cci commented 4 years ago

Another peanut gallery comment: Would it be easier to pass through a bool & argument that you set to false when you detect a problem than to manually aggregate the return values through all the declaration merging code?

mwhicks1 commented 4 years ago

Ooh that's a much better idea. More monadic!

mattmccutchen-cci commented 3 years ago

FTR: Shilpa (ultimately in #330) got us from undefined behavior to an immediate process exit, but we want to go to just a diagnostic in the future for the benefit of clangd3c. I updated the issue title to reflect the remaining work.

mwhicks1 commented 3 years ago

Labeling as enhancement, since the change in error handling would be when enhancing to support clangd.

kyleheadley commented 3 years ago

Additional discussion is in #341. Since the original issue was handled, I'm closing this and putting a link back to it there.

mattmccutchen-cci commented 3 years ago

I believe Mike intended to leave this open for an enhancement to issue a diagnostic instead of exiting, for the benefit of clangd3c. #341 would make this enhancement moot for valid programs, but the enhancement may still be helpful to stop clangd3c from exiting on invalid programs.

mwhicks1 commented 3 years ago

Yes, let's leave it open. Sorry about the confusion.

mattmccutchen-cci commented 3 years ago

I believe this was fixed in #488.