Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Incorrect sample profiles at O3 #28986

Open Quuxplusone opened 8 years ago

Quuxplusone commented 8 years ago
Bugzilla Link PR28991
Status NEW
Importance P normal
Reported by David Callahan (dcallahan@fb.com)
Reported on 2016-08-15 18:40:57 -0700
Last modified on 2016-08-18 09:01:21 -0700
Version trunk
Hardware PC Linux
CC danielcdh@gmail.com, davidxl@google.com, hfinkel@anl.gov, llvm-bugs@lists.llvm.org, spatel+llvm@rotateright.com
Fixed by commit(s)
Attachments bug2.tar (10240 bytes, application/x-tar)
Blocks
Blocked by
See also
Created attachment 16963
Tar file of script and input files

bug2.sh generates the reproduction where we compile a training program either -
00 or -03.

The -O0 training yield this sample

Function: _Z8clampSumPdidd: 105434939, 1, 14 sampled lines
Samples collected in the function's body {
  0: 1
  1: 1
  2.1: 949899
  2.2: 949797
  3: 949833
  4: 0
  5: 949926
  6: 949896
  7: 0
  8: 949929
  9: 949904
  11: 949889
  12: 949809
  14: 4
}

while the -O3 training yields this sample

Function: _Z8clampSumPdidd: 76108151, 2, 9 sampled lines
Samples collected in the function's body {
  0: 2
  2.1: 1435907
  3: 1436091
  6: 1436048
  7: 1436032
  9: 2
  11: 1435984
  12: 1435974
  14: 2
}

Note the differences in values for relative statements 7 and 9.
The corresponding branch_weights are
   !74 = !{!"branch_weights", i32 1, i32 949905}
and
   !74 = !{!"branch_weights", i32 1436033, i32 3}

The -O3 sample inverts the relative weights relative to the (correct) -O0
sample.

The attached tar file has bug2.sh, bug.cc and main.cc and a snapshot of bug2.sh
output.
Quuxplusone commented 8 years ago

Attached bug2.tar (10240 bytes, application/x-tar): Tar file of script and input files

Quuxplusone commented 8 years ago

When SimplifyCFG speculates the execution of a block, the statements in that block will have execution frequency raise to that of the dominating block. For the "then" clause this raises the frequency from 0 to a same as the trip count of the loop.

The same thing would happen to the "else" block but in this case the operations are loop invariant and so get housed out of the loop and instead of getting the same frequency as the loop body, they the much lower estimate of the loop header.

Quuxplusone commented 8 years ago

This is an interesting case: the branch weights become opposite in optimized binary.

But for this case, neither LICM nor if-conversion replies on profile data. As a result, the same optimization will take place no matter how wrong the profile is when profile is used. And after if-conversion, the profile correctness would not matter at all for later optimizations. So I think we should be able to live with the incorrect profile?

Quuxplusone commented 8 years ago

Code layout can be bad with this

Quuxplusone commented 8 years ago

Re #3 -- to clarify : if ifcvt uses profile data to make decision and do not convert highly biased branch, then bad profile can have bigger impact.

Quuxplusone commented 8 years ago

Also, in CodeGen Prepare, there is code that attempts to undo a select based on branch weights. isFormingBranchFromSelectProfitable(). Getting this wrong compromises hmmer in Spec2006

Quuxplusone commented 8 years ago
(In reply to comment #5)
> Also, in CodeGen Prepare, there is code that attempts to undo a select based
> on branch weights. isFormingBranchFromSelectProfitable().  Getting this
> wrong compromises hmmer in Spec2006

Can you file a separate bug for that (ideally with a reduced test case)? Thanks!