Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

[analyzer] EnumCastOutOfRange false-positive for bitmasks #48350

Open Quuxplusone opened 3 years ago

Quuxplusone commented 3 years ago
Bugzilla Link PR49381
Status NEW
Importance P enhancement
Reported by Balazs Benics (balazs.benics@sigmatechnology.se)
Reported on 2021-03-01 08:55:31 -0800
Last modified on 2021-03-04 02:12:13 -0800
Version 11.0
Hardware PC All
CC dcoughlin@apple.com, endre.fulop@sigmatechnology.se, llvm-bugs@lists.llvm.org, whisperity@gmail.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
The EnumCastOutOfRange checker emits a warning for this code:

// -std=c++17 -Xclang -analyze -Xclang -analyzer-
checker=core,alpha.cplusplus.EnumCastOutOfRange -Xclang -analyzer-output=text
https://godbolt.org/z/xosTxq

  enum TestEnum {
    A = 0,
    B = 1,
    C = 2,
    D = 4,
  };

  void bitmasks() {
    static_assert((B | C | D) == 7);
    TestEnum t = static_cast<TestEnum>(7); // should not warn
    (void)t;
  }

The checker should recognize the bitmask and check the bounds accordingly.
/CC Endre Fülöp, the co-author of the checker
Quuxplusone commented 3 years ago
Thanks for reaching out!

IMHO in C++ this is definitely the expected warning.
I have found this as a motivating example with a valid workaround for combining
bitmasks:
https://stackoverflow.com/questions/4920783/combining-enum-value-using-bitmask

In C this could be debatable, and if there are indeed multiple real code
examples of this sloppy enum use, then at least the language mode could be
taken into account.
Quuxplusone commented 3 years ago
I am not sure about C, but in C++, the TestEnum (with the value range [0, 4])
is ***NOT*** required to be able to represent "7".

As per http://eel.is/c++draft/dcl.enum#7

> For an enumeration whose underlying type is not fixed, the underlying type is
an integral type that can represent all the enumerator values defined in the
enumeration. [...]

7 is not defined in the enumeration. So the compiler is not required to present
you with an underlying integral type that might be able to represent 7. The
only requirement is that up to 4 shall be representable.

----

As a side-note, if you consider the bitmask enum helper functions in LLVM's own
codebase (include/llvm/ADT/BitmaskEnum.h), the idiomatic usage of bitmasks in
LLVM's code is:

    enum MyFlags {
      A = 0,
      B = 1,
      C = 2,
      D = 4,

      LLVM_MARK_AS_BITMASK_ENUM(/* LargestValue =*/ D)
    };

LLVM_MARK_AS_BITMASK_ENUM expands to

    LLVM_BITMASK_LARGEST_ENUMERATOR = LargestValue

I am not sure what the benefit of this extra declaration is from the user's
PoV. As reading this "LARGEST_ENUMERATOR" will still give you the same value, 4
in our case. So it doesn't become the next power of 2, but there are some hint
inside about calculating powers of 2.
Quuxplusone commented 3 years ago
Then just imagine that the underlying type is fixed. The issue is the same.
This is a clear and easy bug in this checker.
Quuxplusone commented 3 years ago
I guess, if you are right then this code is suspicious:
https://github.com/llvm/llvm-project/blob/53298b8c8d74d0064c673db18addfe973b544dbb/clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h#L1529-L1545
Quuxplusone commented 3 years ago
(In reply to Balazs Benics from comment #4)
> I guess, if you are right then this code is suspicious:
> https://github.com/llvm/llvm-project/blob/
> 53298b8c8d74d0064c673db18addfe973b544dbb/clang/include/clang/StaticAnalyzer/
> Core/PathSensitive/MemRegion.h#L1529-L1545

Indeed, I've fixated on a single rule, my bad. The problem is, you have to look
at the law from above, and not just the individual case decisions.

Basically, it **is** true, that if you say an enum is between 0 and 4, then a
type shall be selected that must be able to represent [0, 4]. However, the
"unfortunate" reality is that the representation and requirements against the
selected integral type (whether a standard one or an implementation-defined
one) is that it **must** be able to represent -2^(N-1) to 2^N - 1.
[http://eel.is/c++draft/basic.fundamental].

The other requirement is that **if** you would otherwise fit into an int, the
implementation is prohibited from choosing a type larger than int (let's say
uint4096_t or something like that).

So in the case of a required range [0, 4], we need at least 3 bits. The
smallest possible integral type is char, which is **at least** 8 bits wide.
However, because you've written "0", it will be **signed** value
[http://eel.is/c++draft/dcl.enum#5.1]. Signed integrals take precedence in the
type list over their unsigned counterparts.

So what the compiler is required to **at least** select as the implicit
underlying type is "signed char", giving us the valid range [-128, 127]. And
all of these values will be valid, even though the enum did not define it. (But
not [0, 255]!)

I believe Clang should *really* be able to tell us what real range it selected
for the enum's underlying type. Although I fear Clang is lazy, and even for
such a simple enum of elements [0, 4], it will just go with "int". Which is
wider than what's *truly* required.

(In reply to Balazs Benics from comment #3)
> Then just imagine that the underlying type is fixed. The issue is the same.
> This is a clear and easy bug in this checker.

Do you mean that even in the case of

    enum POSIXACLBits : unsigned char

the checker still warns for the cast of 7 ?