Open Quuxplusone opened 8 years ago
Bugzilla Link | PR28457 |
Status | NEW |
Importance | P normal |
Reported by | Adam Nowacki (nowak-llvm@tepeserwery.pl) |
Reported on | 2016-07-07 11:44:13 -0700 |
Last modified on | 2019-10-02 10:08:01 -0700 |
Version | trunk |
Hardware | PC All |
CC | a.bataev@hotmail.com, craig.topper@gmail.com, efriedma@quicinc.com, hfinkel@anl.gov, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, spatel+llvm@rotateright.com, v.porpodas@gmail.com, Vasileios.porpodas@intel.com |
Fixed by commit(s) | |
Attachments | |
Blocks | |
Blocked by | |
See also |
Probably some sort of pass ordering problem: the loop vectorizer splits off the last four iterations into a separate loop, then the loop unroller runs, and the SLP vectorizer doesn't run again after that.
Current codegen: https://godbolt.org/z/b-d1LX (still as bad as reported)
At least the first example is purely an SLP shortcoming. Test added here:
https://reviews.llvm.org/rL373483
This might improve with:
https://reviews.llvm.org/D57059
...but I haven't checked it.
And I'm not sure what the current status is, but that example likely should be
changed if we start preferring 128-bit vector ops by default. This is the kind
of 1-off ymm op that can cause frequency throttling and reduce perf overall.
(In reply to Sanjay Patel from comment #3)
> At least the first example is purely an SLP shortcoming. Test added here:
Sorry - I didn't notice that in the original description, the 1st example was
vectorizing as expected with the additional flag:
-fslp-vectorize-aggressive
That changed somewhere between clang 4.0 and clang 5.0 (we dropped the
BBVectorizer pass at that point?).