Open Quuxplusone opened 6 years ago
Craig/Gadi - any thoughts?
Silvermont has the additional problem that the reciprocal throughput is also high. Haswell's reciprocal througput is 2 so its more reasonable.
This is only checked in reduceVMULWidth right? I don't think it would change anything about PR28128 since I don't think we can reduce the mul width there.
I haven't looked closely at the alternative sequences reduceVMULwidth will generate.
PR28128 - we probably need to bump the 4i32/v8i32 mul costs in X86TargetTransformInfo.cpp, but I can't see it affecting the vectorization decision tbh.
I keep meaning to more aggressively use X86ISD::VPMADDWD like Peter suggested on D41484, but again that's a special case.
Another potential user of this feature flag came up in bug 34474 (mul by pow2
+/- 1) solved by this patch:
https://reviews.llvm.org/D52195
(In reply to Sanjay Patel from comment #4)
> Another potential user of this feature flag came up in bug 34474 (mul by
> pow2 +/- 1) solved by this patch:
> https://reviews.llvm.org/D52195
https://godbolt.org/z/Vq9Pi2 says this is a definite win on Haswell, Broadwell,
and Skylakes
Hello, it seems that LLVM sets this flag for Silvermont processors, but not others. On Haswell or Skylake processors (for example), PMULLD has latency 10, when other vector multiplication instructions have latency 5.
https://bugs.llvm.org/show_bug.cgi?id=28128 seems related.