Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

[Umbrella] Teach RegionStore about binding extents. #43084

Open Quuxplusone opened 4 years ago

Quuxplusone commented 4 years ago
Bugzilla Link PR44114
Status CONFIRMED
Importance P enhancement
Reported by Oliver Schneider (o.schneider@precitec-optronik.de)
Reported on 2019-11-22 04:50:31 -0800
Last modified on 2020-11-02 17:38:59 -0800
Version 9.0
Hardware PC Linux
CC dcoughlin@apple.com, llvm-bugs@lists.llvm.org, noqnoqneo@gmail.com
Fixed by commit(s)
Attachments mwe.7z (13112 bytes, application/x-7z-compressed)
Blocks
Blocked by
See also
Created attachment 22855
The minimal working example as .cpp file plus a GNUmakefile and the output of a
run of CSA (9.0.0) on it.

Hi,

admittedly neither I nor my colleagues are sure whether this is a false
positive of the static analyzer (CSA) or a misconception of sorts on our part.

So the issue is in the attached (self-contained) example. I am pasting it here
for reference:

-----
#include <cstdio>
#include <cstdint>

struct device_registers
{
    uint32_t n1;
    uint32_t n2;
    uint32_t n3;
    uint32_t n4;
    uint32_t n5;
} __attribute__((packed));

static_assert(
    20 == sizeof(struct device_registers),
    "unexpected struct size"
);

static void CopyDeviceRegisters(
        struct device_registers volatile& dest,
        struct device_registers volatile* src
)
{
    dest.n1 = src->n1;
    dest.n2 = src->n2;
    dest.n3 = src->n3;
    dest.n4 = src->n4;
    dest.n5 = src->n5;
}

// this is a pretend process_bytes() from boost/crc.hpp
// (where it's a member function)
void process_bytes(void const* buffer, size_t byte_count)
{
    // "checksum creation"
    unsigned char const* const b = static_cast<unsigned char const*>(buffer);
    for(size_t i = 0; i < byte_count; i++)
    {
        printf("%02X ", b[i]);
    }
    printf("\n");
}

int main(int, char**)
{
    // in our code this is a uio mapping
    struct device_registers mock{};
    struct device_registers volatile* mapped_regs = &mock;

    // purpose is to get a copy of the volatile mapping to compute a
    // checksum
    struct device_registers shadow;

    CopyDeviceRegisters(shadow, mapped_regs);

    // now we merely need to read from the (shadow) struct to mimick
    // what the real process_bytes() would do ...
    process_bytes(&shadow, sizeof(shadow));
    return 0;
}
-----

This code is representative of the issue we're seeing and the attached file
includes a run of CSA over that MWE.

After some tinkering it appears clear that the volatile nature of mapped_regs
and the arguments to CopyDeviceRegisters() are an issue here.

My guess is that CSA "reasons" that - due to the volatility - the data in
mapped_regs isn't initialized and that fact isn't changed by
CopyDeviceRegisters().

How would we have to convey it to CSA/Clang that we deem local variable
(shadow) initialized after the call to CopyDeviceRegisters()? ... or is this a
false positive which we should not care about?

Thank you for reading and with best regards,

Oliver

PS: After unpacking the MWE you should be able to use 'make analyze' which
should 'make clean' (implicitly), followed by running 'make mwe' via scan-
build. So the assumption is that scan-build is in the PATH and works.

We're working with a Clang that I build myself from
0399d5a9682b3cef71c653373e38890c63c4c365 (which should be the 9.0.0 release tag
or at least _was_ at one point) which targets x86_64-unknown-linux-gnu on a
vanilla Ubuntu 14.04 (yes, I built it on a pristine LXD image of Ubuntu 14.04,
so there isn't any "contamination" beyond build dependencies).
Quuxplusone commented 4 years ago

Attached mwe.7z (13112 bytes, application/x-7z-compressed): The minimal working example as .cpp file plus a GNUmakefile and the output of a run of CSA (9.0.0) on it.

Quuxplusone commented 4 years ago
Aha, yes, that's a false positive on our part. The analyzer fully understands
what does CopyDeviceRegisters() do, but our memory model fails to confirm that
writing a 32-bit integer would initialize all 4 bytes of that integer, so we
become confused when you start reading bugs byte-by-byte.

This is one of the most commonly reported bugs these days, so i hope i'll get
to it soon, but it requires large changes to fix.

If you want to suppress the bug, you can try the following trick:

 static void CopyDeviceRegisters(
        struct device_registers volatile& dest,
        struct device_registers volatile* src
 )
 {
+#ifdef __clang_analyzer__
+   // https://bugs.llvm.org/show_bug.cgi?id=44114
+   extern void maybe_initialize(void *x);
+   maybe_initialize(&dest);
+#endif
    dest.n1 = src->n1;
    dest.n2 = src->n2;
    dest.n3 = src->n3;
    dest.n4 = src->n4;
    dest.n5 = src->n5;
 }
Quuxplusone commented 4 years ago

Thank you very much for the confirmation and the additional information.

Am I right to assume that the CSA only requires the prototype of maybe_initialize() because it does not (currently) analyze beyond module boundaries?

Quuxplusone commented 4 years ago

No, it's just to keep it a valid C++ program. In C++ you can't use a function without declaring it (unlike C) and the code under analysis still needs to be a valid program.

The function is completely fake, of course, but nobody will notice that it's fake until link time, and the analyzer will not do linking, so it's fine. And it won't get to the actual .o file because it's preprocessed out because __clang_analyze__ isn't defined during normal compilation.

So the point of the suppression is to tell the analyzer "well, something happens to &dest here, just trust me".

Quuxplusone commented 4 years ago

Artem, having watched Meike's talk [https://www.youtube.com/watch?v=F_q50Th1t3A], do you think this ticket would be a reasonable entry point for contributions for someone who has never before contributed to Clang/LLVM?

Quuxplusone commented 4 years ago

No, unfortunately, this is too difficult for a newcomer. I could have tried to do it as a GSoC project with a good student. The problem is easy to understand, but this piece of code (RegionStore) is one of the most difficult, convoluted and controversial parts of the Static Analyzer.

Quuxplusone commented 3 years ago

https://bugs.llvm.org/show_bug.cgi?id=47727, http://lists.llvm.org/pipermail/cfe-dev/2020-November/067145.html: another example.

In this case the users were confused about whether it's a true positive warning that highlights a strict aliasing violation. In fact we could go down this road and transform instances of this bug into strict aliasing violation warnings when applicable; that would potentially simplify the solution when it'll only require us to reason about extents when one of the involved types is char (which bypasses strict aliasing rules). I don't see much indication that this approach would actually be much easier but these problems are indeed somewhat related.