Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

False positives related to __builtin_clzll() and std::vector #20902

Open Quuxplusone opened 10 years ago

Quuxplusone commented 10 years ago
Bugzilla Link PR20902
Status NEW
Importance P normal
Reported by Warren Schudy (wschudy@gmail.com)
Reported on 2014-09-10 23:52:08 -0700
Last modified on 2014-09-16 17:32:08 -0700
Version unspecified
Hardware PC Linux
CC jrose@belkadan.com, klimek@google.com, llvm-bugs@lists.llvm.org
Fixed by commit(s)
Attachments report-362487.html (5899 bytes, text/html)
test.cc (420 bytes, text/plain)
Blocks
Blocked by
See also
Created attachment 13017
scan-build error report demonstrating the bug

Running scan-build on this program yeilds a warning about the result of the
left-shift expression being undefined, which appears to be spurious, and a note
that the 'true' branch of an 'if' is taken, which is definitely invalid. I've
attached the scan-build error report. Running the program confirms the obvious:
the 'false' branch of the 'if' is taken. The expected output is no bug reports
and (optionally) a note that the 'false' branch of the 'if' is taken.

------ Details ------

I created a fresh Ubuntu 14.04.1 install. I then installed the packages clang-
3.5 and g++ with apt-get.

wschudy@ubuntu:~$ cat test.cc
#include <cstdio>
#include <vector>

inline int Log2Floor64(unsigned long long n) {
  return n == 0 ? -1 : 63 ^ __builtin_clzll(n);
}

int main(int argc, char** argv) {
  std::vector<int> xs = {0};
  if (xs[0] != 0) {
    unsigned long long count = 1ull << Log2Floor64(5.0);
    printf("\"Then\" branch, i.e. xs[0] != 0, with count == %llu\n", count);
  } else {
    printf("\"Else\" branch, i.e. xs[0] == 0\n");
  }
}

wschudy@ubuntu:~$ clang --version
Ubuntu clang version 3.5-1ubuntu1 (trunk) (based on LLVM 3.5)
Target: x86_64-pc-linux-gnu
Thread model: posix
wschudy@ubuntu:~$ clang test.cc -std=c++11 -lstdc++
wschudy@ubuntu:~$ ./a.out
"Else" branch, i.e. xs[0] == 0
wschudy@ubuntu:~$ scan-build clang test.cc -std=c++11 -lstdc++
scan-build: Using '/usr/bin/clang' for static analysis
test.cc:11:37: warning: The result of the '<<' expression is undefined
    unsigned long long count = 1ull << Log2Floor64(5.0);
                               ~~~~~^~~~~~~~~~~~~~~~~~~
1 warning generated.
scan-build: 1 bugs found.
scan-build: Run 'scan-view /tmp/scan-build-2014-09-10-214330-4211-1' to examine
bug reports.
Quuxplusone commented 10 years ago

Attached report-362487.html (5899 bytes, text/html): scan-build error report demonstrating the bug

Quuxplusone commented 10 years ago

Attached test.cc (420 bytes, text/plain): The C++ program (build with "clang test.cc -std=c++11 -lstdc++"). Identical to the "cat test.cc" output in the bug summary.

Quuxplusone commented 10 years ago

I'm not even sure how that's compiling, because 5.0 isn't an unsigned long long. But yes, the analyzer does not model std::vector right now, and it probably isn't modeling clzll either. The latter would be pretty easy to fix, though.

Quuxplusone commented 10 years ago

Jordan:

  1. AIUI C++ does narrowing conversions automatically in many contexts. (This is a very questionable design decision but we have to live with it.) I'm not a language lawyer but I guess that clang is correct to allow this code. Anyway replacing "5.0" with the explicit cast "static_cast(5.0)" does not change the diagnostics produced so if clang is accepting this code incorrectly that's a separate bug.

  2. It's OK for clang static analyzer to stay silent or give ambiguous warnings because it isn't modeling vector and clzll. It's not OK for it to give diagnostics that confidently state something, i.e. which branch of an 'if' is taken, that is incorrect.

Quuxplusone commented 10 years ago

The analyzer isn't actually running your program, so /everything/ it prints is based on its model. Each path note, including those for branches, is meant to explain the analyzer's assumptions at each point---the analyzer /is/ taking that branch, even though the real program can't.

Quuxplusone commented 10 years ago
I just looked at the FAQ and it looks like I misunderstood what the "Note:
Taking true branch" message means. I guessed that that message was warning me
that scan-build had proved that the false branch was never taken (i.e. dead
code). After seeing the example warnings in the FAQ it seems that message is
not an independent warning but rather an explanation of under what
circumstances it thinks the error about the "<<" being undefined may occur.
Therefore never mind what I said about the taking true branch note being wrong.
Instead make this bug about the error about the "<<" being undefined, which is
a false positive. I believe this false positive is caused by scan-build
(apparently) not modeling double to unsigned long long conversion (i.e. 5.0 to
5ull), so it can't rule out Log2Floor64 receiving n=0 and returning -1, which
in turn would make the left-shift undefined. My evidence for this is if you
replace the problematic line with the following three the error message stays
the same but uncommenting the assert makes it go away:
    unsigned long long five = 5.0;
//    assert(five>0);  //uncommenting this line causes error report to go away
    unsigned long long count = 1ull << Log2Floor64(five);

Now that I (mostly) understand the bug I have four suggestions. These
suggestions are mostly independent so you may want to make additional bugs for
one or more of them to keep them separate.

1. Model double to integral type conversion so this false positive goes away.

2. Please add a page to the "user manual" menu on the webpage describing how to
interpret the error reports, including the fact that the notes are statements
of the branches taken in the error report, not a statement that the value of
the tests can be determined statically. If a user receives multiple correct
error reports it's possible to deduce what the error reports mean but a single
error report, especially a false positive one, is ambiguous so documentation
would help.

3. The error message says that the result "is undefined", which can be
naturally interpreted as a statement that the analyzer has proven that
undefined behavior will definitely occur, which is wrong in this case. If the
analyzer is uncertain about whether or not undefined behavior is occurring it
should say the result "may be undefined". The same suggestion applies to all
error messages: make it clear whether the error message is for a certain error
or a possible error.

4. The error message would be a lot more useful if it were more specific about
the problem than just the left-shift being undefined. For example state that
scan-build deduced that the right-hand operand was bounded between -1 and 63
inclusive (or whatever scan-build actually deduced), which contains value(s)
below the allowed range 0 to 63 inclusive.