Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Miscompilation of External/SPEC/CFP2006/444.namd #31224

Open Quuxplusone opened 7 years ago

Quuxplusone commented 7 years ago
Bugzilla Link PR32251
Status NEW
Importance P enhancement
Reported by Michael Kruse (llvm@meinersbur.de)
Reported on 2017-03-13 06:43:07 -0700
Last modified on 2017-04-05 13:28:20 -0700
Version unspecified
Hardware PC Windows NT
CC llvm-bugs@lists.llvm.org, thanm@google.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
SPEC CPU 2016 1.1 444.namd currently fails with Polly (-O3 -mllvm -polly -mllvm
-polly-process-unprofitable). Trying to bisect the commit causing it, I could
go back to r288174 where it still fails. This is the first commit where Polly
compiles with current LLVM/Clang trunk (r297414)

This might be something uncovered by an LLVM change. My next step would be to
bisect with https://github.com/llvm-project/llvm-project .

Command line:
$ /tmp/lnt/bin/python /tmp/lnt/bin/lnt runtest test-suite --sandbox
/tmp/runtests-858grax9 --test-suite /root/src/llvm/projects/test-suite --cc
/tmp/runtests-858grax9/cc --cxx /tmp/runtests-858grax9/c++ --use-lit
/root/build/llvm/release/bin/llvm-lit --cmake-define
TEST_SUITE_LLVM_SIZE=/root/build/llvm/release/bin/llvm-size --use-cmake
/usr/bin/cmake --cmake-cache ReleaseNoLTO --no-timestamp --benchmarking-only --
threads 8 --build-threads 8 --only-test External/SPEC/CFP2006/444.namd --
cppflags -mllvm --cppflags -polly --cppflags -mllvm --cppflags -polly-process-
unprofitable

[...]

FAIL: test-suite :: External/SPEC/CFP2006/444.namd/444.namd.test (1 of 1)
******************** TEST 'test-suite ::
External/SPEC/CFP2006/444.namd/444.namd.test' FAILED ********************

/tmp/runtests-858grax9/build/tools/timeit --limit-core 0 --limit-cpu 7200 --
timeout 7200 --limit-file-size 104857600 --limit-rss-size 838860800 --redirect-
stdout /tmp/runtests-858grax9/build/External/SPEC/CFP2006/444.namd/namd.stdout -
-redirect-input /dev/null --summary /tmp/runtests-
858grax9/build/External/SPEC/CFP2006/444.namd/Output/444.namd.test.time
/tmp/runtests-858grax9/build/External/SPEC/CFP2006/444.namd/444.namd --input
/root/src/llvm/projects/test-suite/test-suite-
externals/speccpu2006/benchspec/CPU2006/444.namd/data/all/input/namd.input --
iterations 1 --output /tmp/runtests-
858grax9/build/External/SPEC/CFP2006/444.namd/namd.out
/tmp/runtests-858grax9/build/tools/fpcmp -a 0.00001
/root/src/llvm/projects/test-suite/test-suite-
externals/speccpu2006/benchspec/CPU2006/444.namd/data/train/output/namd.out
/tmp/runtests-858grax9/build/External/SPEC/CFP2006/444.namd/namd.out

/tmp/runtests-858grax9/build/tools/fpcmp: FP Comparison failed, not a numeric
difference between ' ' and 'R'
Quuxplusone commented 7 years ago
Bisecting revealed commit r270559

Author: Than McIntosh <thanm@google.com>
Date:   Tue May 24 13:23:44 2016 +0000

    Rework/enhance stack coloring data flow analysis.

    Replace bidirectional flow analysis to compute liveness with forward
    analysis pass. Treat lifetimes as starting when there is a first
    reference to the stack slot, as opposed to starting at the point of the
    lifetime.start intrinsic, so as to increase the number of stack
    variables we can overlap.

    Reviewers: gbiv, qcolumbet, wmi
    Differential Revision: http://reviews.llvm.org/D18827

    Bug: 25776

being the cause. I assume this just revealed a bug in Polly. I find this
somewhat surprising since I had SPEC2006 passing well after May last year. The
only explanation I have is that test-suite/lnt changed as well since then (both
were trunk and not bisected).
Quuxplusone commented 7 years ago
The issue are the lifetime markers left behind by Polly. They are removed from
the generated code, but the original code still contains them.

In particular, the code before Polly was:

    llvm.lifetime.start(&var)
    [...]
    llvm.lifetime.end(&var)

The code after Polly looks like:

    if (VersioningCond) {
    } else {
      llvm.lifetime.start(&var)
    }
    [...]
    llvm.lifetime.end(&var)

Such that there are execution paths (the one taken in the test-suite execution)
where llvm.lifetime.start is not executed before llvm.lifetime.end. According
the semantics in http://llvm.org/docs/LangRef.html#llvm-lifetime-start-
intrinsic this doesn't seem wrong. It just means that &var is never dead before
the llvm.lifetime.end marker.

This variant fixes it:

    if (VersioningCond) {
      llvm.lifetime.start(&var)
    } else {
      llvm.lifetime.start(&var)
    }
    [...]
    llvm.lifetime.end(&var)

I don't know whether the fault is at Polly to generate such code or with
http://reviews.llvm.org/D18827 which does not correctly interpret the
lifetimes. If it's Polly's fault, then I don't still know whether the second
version is correct or does not miscompile because of other reasons. Polly at
the moment cannot ensure the correct positioning of llvm.lifetime.start (that
is, before every use that originally comes after llvm.lifetime.start)
Quuxplusone commented 7 years ago
What's happening here is that stack coloring is deciding to overlap stack slot
0 (corresponding to the variable "iterations") with stack slot 1 (corresponding
to the variable "simParams"), which is definitely incorrect -- these variables
do not have disjoint lifetimes.

Looking at the IR in the base (-O3, no polly) case, the lifetime start marker
for "iterations" appears towards the end of the entry block, meaning that the
code for the first few lines of the functions looks roughly like

    define i32 @main(i32 %argc, i8** nocapture readonly %argv) ...
    entry:
      %iterations = alloca i32, align 4
      %simParams = alloca %class.SimParameters, align 8
      ...
      %comp = alloca %class.ResultSet, align 8
      %0 = bitcast i32* %iterations to i8*
      call void @llvm.lifetime.start(i64 4, i8* nonnull %0) #11
      store i32 -1, i32* %iterations, align 4, !tbaa !1
      %sub = and i32 %argc, 1
      %tobool = icmp eq i32 %sub, 0
      br i1 %tobool, label %if.then, label %for.cond.preheader

    ...

    if.then:                                          ; preds = %entry
      %2 = load i8*, i8** %argv, align 8, !tbaa !5
      tail call void @_z10exit_usagepkc(i8* %2)
      unreachable

This is the normal pattern for a function's top-level local variables (e.g.
lifetime start appears in the entry block). Looking at the coming into the
optimizer with Polly turned on, I see the following instead:

    define i32 @main(i32 %argc, i8** nocapture readonly %argv) local_unnamed_addr #3 personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {
    entry:
      %iterations = alloca i32, align 4
      %0 = bitcast i32* %iterations to i8*
      %simParams = alloca %class.SimParameters, align 8
      ...
      %comp = alloca %class.ResultSet, align 8
      %1 = and i32 %argc, 1
      %2 = icmp eq i32 %1, 0
      br i1 %2, label %if.then, label %for.cond.preheader

    if.then:                                          ; preds = %entry
      call void @llvm.lifetime.start(i64 4, i8* nonnull %0) #11
      %3 = load i8*, i8** %argv, align 8, !tbaa !1
      tail call void @_Z10exit_usagePKc(i8* %3)
      unreachable

    for.cond.preheader:                               ; preds = %entry
      store i32 -1, i32* %iterations, align 4, !alias.scope !5, !noalias !7
      %cmp528 = icmp sgt i32 %argc, 1
      br i1 %cmp528, label %for.body.lr.ph, label %if.then50

Note that the lifetime start marker for "iterations" has been relocated, pushed
down into the block containing the no-return call to exit (oddly, the bitcast
feeding the lifetime start still seems to be stuck up in the entry block, not
sure why). It would be nice to understand why the initial placement of the
markers is happening this way.

The current implementation of stack coloring uses forward data flow analysis,
meaning that it computes live intervals by propagating uses of a given slot in
block N to N's successors, until hitting a "lifetime end" marker for the slot.

In addition, a given slot is classified as "normal" or "conservative", where
"normal" implies that all uses of the slot are dominated by the lifetime start
marker and "conservative" means that we found some path from the entry block to
a use of a slot that didn't go through a start marker for the slot.

For "normal" slots, the live interval for the slot is treated as starting at
the first use for it after the start marker, whereas for "conservative" slots
we treat the marker itself as the start of the lifetime (for more detail on
this, see http://llvm.org/viewvc/llvm-
project/llvm/trunk/lib/CodeGen/StackColoring.cpp?view=markup&pathrev=297904#l161).

In this case the slots for both "iterations" and "simParams" are being treated
conservatively, however the conservative treatment doesn't help much due to the
migration of the start lifetime for slot 0 out of the entry block.

The reason this case works with the old version of stack coloring was that it
used bidirectional flow analysis -- propagating not just from first uses down
to lifetime ends, but also backwards (from a use block back to its
predecessors), upwards within the control flow graph. This produces a more
extensive lifetime for the "iterations" slot, resulting in an interference
between the two slots that inhibits overlap.

I don't see an easy fix for this in StackColoring, short of going back to the
bidirectional scheme. The old bidirectional analysis would "fix" the problem in
this specific instance, but it is easy to think of examples where the
bidirectional method would also fail if optimization passes are allowed to
arbitrarily relocate markers.

I think at some level or another stack coloring has to trust that the lifetime
start/end markers are a source of "truth". If optimization passes prior to
stack coloring can move markers wherever they want, then there is really no
point looking at the markers at all (which would greatly impair the
effectiveness of stack coloring).
Quuxplusone commented 7 years ago
Fixed in r299585.

However, I am leaving this bug open because while r299585 fixes the problem for
444.namd, it may not fix it in the general case.

Given that the lifetime start itself is a conditional that is outside of the
SCoP;

    if (c) {
      llvm.lifetime.start(&var)
        [...]
    } else {
      llvm.lifetime.start(&var)
        [...]
    }
    [...]
    llvm.lifetime.end(&var)

then after r299585 we would generate

    if (c) {
      if (VersioningCond) {
        [...]
      } else {
        [...]
      }
    } else {
      llvm.lifetime.start(&var)
        [...]
    }
    [...]
    llvm.lifetime.end(&var)

Where again some paths cross llvm.lifetime.start while others don't.

According the the required form suggested by Daniel Berlin in
http://lists.llvm.org/pipermail/llvm-dev/2017-March/111669.html
we need to hoist the llvm.ifetime.start markers to before the loop versioning,
respectively sink the llvm.ifetime.end markers behind it:

    if (c) {
      llvm.lifetime.start(&var)
      if (VersioningCond) {
        [...]
      } else {
        [...]
      }
    } else {
      llvm.lifetime.start(&var)
        [...]
    }
    [...]
    llvm.lifetime.end(&var)