Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Reporting of predicted benefits of vectorisation #30665

Open Quuxplusone opened 7 years ago

Quuxplusone commented 7 years ago
Bugzilla Link PR31691
Status NEW
Importance P normal
Reported by drraph@gmail.com
Reported on 2017-01-19 04:31:52 -0800
Last modified on 2017-01-20 12:04:33 -0800
Version 3.9
Hardware PC Linux
CC anemet@apple.com, drraph@gmail.com, hfinkel@anl.gov, joker.eph@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
When running the Intel Compiler with -qopt-report=4, say, you get told the
expected performance gain from vectorisation, amongst other useful information.
For example,

 LOOP BEGIN at permanent-in-c.c(47,7)
      remark #25045: Fused Loops: ( 47 50 )

      remark #15388: vectorization support: reference v has aligned access   [ permanent-in-c.c(48,2) ]
      remark #15388: vectorization support: reference v has aligned access   [ permanent-in-c.c(48,2) ]
      remark #15389: vectorization support: reference M has unaligned access   [ permanent-in-c.c(48,2) ]
      remark #15388: vectorization support: reference v has aligned access   [ permanent-in-c.c(51,8) ]
      remark #15381: vectorization support: unaligned access used inside loop body
      remark #15305: vectorization support: vector length 4
      remark #15399: vectorization support: unroll factor set to 2
      remark #15309: vectorization support: normalized vectorization overhead 0.600
      remark #15301: FUSED LOOP WAS VECTORIZED
      remark #15442: entire loop may be executed in remainder
      remark #15448: unmasked aligned unit stride loads: 1
      remark #15449: unmasked aligned unit stride stores: 1
      remark #15450: unmasked unaligned unit stride loads: 1
      remark #15475: --- begin vector loop cost summary ---
      remark #15476: scalar loop cost: 49
      remark #15477: vector loop cost: 10.000
      remark #15478: estimated potential speedup: 4.580
      remark #15487: type converts: 3
      remark #15488: --- end vector loop cost summary ---
      remark #25456: Number of Array Refs Scalar Replaced In Loop: 2
   LOOP END

gcc also gives you similar if perhaps less useful information.

[...]
vect_model_reduction_cost: inside_cost = 6, prologue_cost = 2, epilogue_cost =
6 .
test2.c:50:9: note: ==> examining statement: j_73 = j_186 + 1;
test2.c:50:9: note: irrelevant.
test2.c:50:9: note: ==> examining statement: if (j_73 < n.2_201)
test2.c:50:9: note: irrelevant.
test2.c:50:9: note: === vect_update_slp_costs_according_to_vf ===
test2.c:50:9: note: cost model: epilogue peel iters set to vf/2 because loop
iterations are unknown .
test2.c:50:9: note: Cost model analysis:
  Vector inside of loop cost: 50
  Vector prologue cost: 8
  Vector epilogue cost: 52
  Scalar iteration cost: 20
  Scalar outside cost: 4
  Vector outside cost: 60
  prologue iterations: 0
  epilogue iterations: 2
  Calculated minimum iters for profitability: 5
test2.c:50:9: note:   Runtime profitability threshold = 4
test2.c:50:9: note:   Static estimate profitability threshold = 4
test2.c:50:9: note: epilog loop required

[...]

It would be great if clang/llvm could provide similar information to the
user/coder.
Quuxplusone commented 7 years ago

Great idea. We should add the predicted-speedups to our vectorizer's optimization remarks.

Quuxplusone commented 7 years ago

Great suggestion! We already report something like this for the inliner but not for the vectorizer.

I think I mentioned this on IRC but we have some other facilities around opt remarks that you might find useful (and provide feedback on ;). See my dev meeting talk (http://llvm.org/devmtg/2016-11/#talk15).