Why FeatureSlowPMULLD is not set for Haswell+?

Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Why FeatureSlowPMULLD is not set for Haswell+? #34921

Open Quuxplusone opened 6 years ago

Quuxplusone commented 6 years ago


Bugzilla Link	PR35948
Status	NEW
Importance	P enhancement
Reported by	Ivan G (nekotekina@gmail.com)
Reported on	2018-01-15 05:12:59 -0800
Last modified on	2021-10-05 09:36:32 -0700
Version	6.0
Hardware	All All
CC	craig.topper@gmail.com, gadi.haber@intel.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, spatel+llvm@rotateright.com
Fixed by commit(s)
Attachments
Blocks	PR32325
Blocked by
See also	PR34474, PR52039

Hello, it seems that LLVM sets this flag for Silvermont processors, but not others. On Haswell or Skylake processors (for example), PMULLD has latency 10, when other vector multiplication instructions have latency 5.

https://bugs.llvm.org/show_bug.cgi?id=28128 seems related.

Quuxplusone commented 6 years ago

Craig/Gadi - any thoughts?

Quuxplusone commented 6 years ago

Silvermont has the additional problem that the reciprocal throughput is also high. Haswell's reciprocal througput is 2 so its more reasonable.

This is only checked in reduceVMULWidth right? I don't think it would change anything about PR28128 since I don't think we can reduce the mul width there.

I haven't looked closely at the alternative sequences reduceVMULwidth will generate.

Quuxplusone commented 6 years ago

PR28128 - we probably need to bump the 4i32/v8i32 mul costs in X86TargetTransformInfo.cpp, but I can't see it affecting the vectorization decision tbh.

I keep meaning to more aggressively use X86ISD::VPMADDWD like Peter suggested on D41484, but again that's a special case.

Quuxplusone commented 6 years ago

Another potential user of this feature flag came up in bug 34474 (mul by pow2
+/- 1) solved by this patch:
https://reviews.llvm.org/D52195

Quuxplusone commented 6 years ago

(In reply to Sanjay Patel from comment #4)
> Another potential user of this feature flag came up in bug 34474 (mul by
> pow2 +/- 1) solved by this patch:
> https://reviews.llvm.org/D52195

https://godbolt.org/z/Vq9Pi2 says this is a definite win on Haswell, Broadwell,
and Skylakes