Open Quuxplusone opened 3 years ago
Bugzilla Link | PR51036 |
Status | CONFIRMED |
Importance | P normal |
Reported by | Balazs Benics (balazs.benics@sigmatechnology.se) |
Reported on | 2021-07-09 01:30:50 -0700 |
Last modified on | 2021-07-23 07:52:56 -0700 |
Version | 12.0 |
Hardware | PC All |
CC | dcoughlin@apple.com, dpetrov@accesssoftek.com, llvm-bugs@lists.llvm.org, martongabesz@gmail.com, nok.raven@gmail.com, noqnoqneo@gmail.com, vsavchenko@apple.com |
Fixed by commit(s) | |
Attachments | |
Blocks | |
Blocked by | |
See also |
Constraint solver improvements *may* have improved the situation to the point
where enabling casts is actually going to do more good than harm. If so, it'd
be amazing and solve a lot of problems for us (including, but not limited to,
https://bugs.llvm.org/show_bug.cgi?id=44114). But I'll believe it when I see it.
It was pretty bad last time I tried and I don't think my specific concerns were
addressed yet. People are careless about integral types all the time and most
of the time it's harmless. Execution paths that involve overflows, truncations,
or sign extensions, are very difficult to explain to a person unfamiliar with
all these rules and may seem like false positives unless a really good system
of notes is implemented; we may have to suppress such paths entirely. See also
my thoughts in https://reviews.llvm.org/D103096
P.S. Folks, I'm still shocked that you're trying to use array bound checker in
production. It's terribly broken due to improper loop modeling. I'm fairly
confident that it'll only ever work as a taint-based checker (warn about under-
constrained tainted-symbol indexes) where concrete indexes introduced by loop
unrolling don't intervene.
Constraint solver improvements may have improved the situation to the point where enabling casts is actually going to do more good than harm. If so, it'd be amazing and solve a lot of problems for us (including, but not limited to, https://bugs.llvm.org/show_bug.cgi?id=44114). But I'll believe it when I see it. I thought memory modeling and the strict-aliasing madness is an orthogonal topic to evaluating casts according to the semantics of C/C++.
It was pretty bad last time I tried and I don't think my specific concerns were addressed yet. People are careless about integral types all the time and most of the time it's harmless. IDK.
Execution paths that involve overflows, truncations, or sign extensions, are very difficult to explain to a person unfamiliar with all these rules and may seem like false positives unless a really good system of notes is implemented; Yes, but that's an engineering problem. I agree it's indeed a difficult problem. This doesn't justify modeling a completely infeasible execution path, like in the example in this ticket.
we may have to suppress such paths entirely. IDK.
See also my thoughts in https://reviews.llvm.org/D103096 I cannot immediately see why we could not make a BRVisitor explaining the effect of the interesting casts.
P.S. Folks, I'm still shocked that you're trying to use array bound checker in production. It's terribly broken due to improper loop modeling. I'm fairly confident that it'll only ever work as a taint-based checker (warn about under-constrained tainted-symbol indexes) where concrete indexes introduced by loop unrolling don't intervene. Some clients are using the ArrayBounds[V2] checkers because they see value in them. It has proven to be useful, finding true-positives. On the other hand, they also tend to produce reports that are hard to follow. It's a personal preference what FP/TP rate is still acceptable.
I think taint analysis is great and many other checkers should consider using it, but let's not digress.
What I really want to conclude is that on the path within the if
block, we should not associate a range with c
that doesn't even intersect with the expected range [0,127].
If it were a super-range of the expected range, let's say c: [-128, 127], it would be a correct approximation, unlike the one we have right now.
I thought memory modeling and the strict-aliasing madness is an orthogonal topic to evaluating casts according to the semantics of C/C++.
Modeling casts basically means achieving type-correctness. Type-correctness is helpful in a lot of places. In that example we can't know how many bytes does the symbol binding overwrite if we don't know the type of the symbol. And this is one of the places where the type of the symbol is the only piece of type information available.
This doesn't justify modeling a completely infeasible execution path, like in the example in this ticket.
A change that introduces 5-10 impossible-to-understand or outright false reports for every false positive it eliminates is not justified either.
> In that example we can't know how many bytes does the symbol binding
overwrite if we don't know the type of the symbol. And this is one of the
places where the type of the symbol is the only piece of type information
available.
Could you be slightly more specific? I could not follow.
From my perspective, we simply bind an incorrect symbolic value to the cast
expression when we evaluate it.
We should not simply model a cast expression by binding the value of the castee
to the cast expression, at least not in the general case.
Right now, we basically only look at the bitwidth of the types besides some
truncation detection later:
SVal SValBuilder::evalIntegralCast(ProgramStateRef state, SVal val,
QualType castTy, QualType originalTy) {
// No truncations if target type is big enough.
if (getContext().getTypeSize(castTy) >= getContext().getTypeSize(originalTy))
return evalCast(val, castTy, originalTy);
However, I think we should emit a symbolic cast if we are casting from signed
to unsigned.
In what specific way would make the analysis worse by emitting a symbolic cast
there compared to the current situation?
> A change that introduces 5-10 impossible-to-understand or outright false
reports for every false positive it eliminates is not justified either.
Could you please add some context?
Do you imply that the solver cannot see through symbolic casts, thus we would
have more false-positive bug paths depending on values derived from such
symbolic casts, am I right?
Hey, guys. I've just updated the summary of my revision. You can see the description of the approach in details https://reviews.llvm.org/D103096
Talking about your snippet:
void test_single(signed char c) { if ((unsigned int)c <= 200) { // 'c' supposed to be restricted to be non-negative. // but it should be within [0,127].
Here we reason about the range of c
's particular representation as uint
. The range of c
should be [-57, 127]. That's how C++ rules works and my solution proposes.
(In reply to Denys Petrov from comment #5)
Here we reason about the range of
c
's particular representation asuint
. The range ofc
should be [-57, 127]. That's how C++ rules works and my solution proposes.Oh, I miscalculate the range of 'c'. Of course it should be [0,127]. I've just tested the example with my solution (https://reviews.llvm.org/D103096) and it produces [0,127] as expected. I'll add the test case to tests.
Good to hear, thanks for working on this topic.
Proposed a fix https://reviews.llvm.org/D103096