Fraunhofer-AISEC / cpg

A library to extract Code Property Graphs from C/C++, Java, Go, Python, Ruby and every other language through LLVM-IR.
https://fraunhofer-aisec.github.io/cpg/
Apache License 2.0
248 stars 60 forks source link

Java Cast Error #474

Closed peckto closed 2 years ago

peckto commented 2 years ago

When loading the following C code with the cpg, I get an Exception:

int main() {
    size_t count = (size_t)(42);
    return 0;
}
java.lang.ClassCastException: class 
org.eclipse.cdt.internal.core.dom.parser.cpp.CPPASTUnaryExpression cannot be cast to class 
org.eclipse.cdt.internal.core.dom.parser.cpp.CPPASTIdExpression

Whereas the following code can be loaded successfully:

int main() {
    size_t count = (size_t)42;
    return 0;
}

I'm using the cpg and cpg-neo4j from master branch. Can you reproduce the error?

oxisto commented 2 years ago

Yes, I can. Thanks for catching this. It seems, it was introduced by #368 and looks like a simple class cast gone wrong. I will have a look at it.

oxisto commented 2 years ago

I fixed it in #475. However, the other issue is that in the first snippet, it seems the CPG thinks this is a CallExpression, rather than a cast. Or am I wrong in assuming that it should be a cast of 42 to size_t?

peckto commented 2 years ago

Thanks for the fix! Works for me. Yes, the intention was a cast. With includes enabled, the cpg is doing this correctly in both situations. But without the include, we have encountered a C language ambiguity ^^ Interpreting (A)(B) as a function call A(B) seems to me like an unusual coding style. Changing the heuristic to interpret this as a cast would be a better fit in my opinion.

Masrepus commented 2 years ago

If A is a function pointer, (A)(B) can very well be a call. It's not more or less unusual to write a call that way than writing a cast like that

oxisto commented 2 years ago

If A is a function pointer, (A)(B) can very well be a call. It's not more or less unusual to write a call that way than writing a cast like that

In general, I agree. However, in the above example, we clearly see, that size_t is a type and thus this should be inferred as a cast expression.

Masrepus commented 2 years ago

That's true. However I fear that finding this out programmatically won't be easily doable

maximiliankaul commented 2 years ago

If A is a function pointer, (A)(B) can very well be a call. It's not more or less unusual to write a call that way than writing a cast like that

In general, I agree. However, in the above example, we clearly see, that size_t is a type and thus this should be inferred as a cast expression.

How can you clearly see that? Only because of the combination of LHS and RHS of the assignment operator, correct?

void (*size_t)(int);
(size_t)(42);

As far as I know, this would be legal as size_tis not a reserved keyword.

peckto commented 2 years ago

In general, I agree. However, in the above example, we clearly see, that size_t is a type and thus this should be inferred as a cast expression.

Do you mean, that we should be aware of the type size_t independent of the includes file? Meaning to extend the list of known types (int, long, flott, ...) by common types of the std lib?

oxisto commented 2 years ago

If A is a function pointer, (A)(B) can very well be a call. It's not more or less unusual to write a call that way than writing a cast like that

In general, I agree. However, in the above example, we clearly see, that size_t is a type and thus this should be inferred as a cast expression.

How can you clearly see that? Only because of the combination of LHS and RHS of the assignment operator, correct?

void (*size_t)(int);
(size_t)(42);

As far as I know, this would be legal as size_tis not a reserved keyword.

Yes, because it is used as a type in the variable declaration.

oxisto commented 2 years ago

In general, I agree. However, in the above example, we clearly see, that size_t is a type and thus this should be inferred as a cast expression.

Do you mean, that we should be aware of the type size_t independent of the includes file? Meaning to extend the list of known types (int, long, flott, ...) by common types of the std lib?

This would be another possibility, to include "common" types of the stdlib, at least with an additional configuration switch.

oxisto commented 2 years ago

Out of curiosity, @peckto was that an actual line of code or a constructed example?

peckto commented 2 years ago

Yes, this is an example from the wild. I just simplified it to show the bug. It was something like:

#define MAX_DATA_SIZE 20
#define MAX_LEN ((size_t)(MAX_DATA_SIZE - 4))

Just to make this clear, in this concrete case the ambiguity is not a problem because the includes are correct. But this would be a nice feature in general.

oxisto commented 2 years ago

I put together a little piece of smart guessing in https://github.com/Fraunhofer-AISEC/cpg/pull/477. Feel free to discuss, whether this makes sense or not :)