Open Quuxplusone opened 6 years ago
Attached same-cost-vf.ll
(2894 bytes, text/plain): IR to demonstrate
resulting in about half the throughput.
This means the cost modeling is way off. The "cost" the vectorizer prints is the cost per scalar iteration, so the estimated cost of each vectorized iteration at VF 4 is twice as expensive as the estimated cost at VF 2.
From the debug output, maybe the cost of the "sext" isn't getting computed correctly?
Attached with-rL317576.out
(29025 bytes, text/plain): LV debug & output with rL317576
Attached without-rL317576.out
(29543 bytes, text/plain): LV debug & output without rL317576
Attached debug-diff.out
(2458 bytes, text/plain): Diff of the LV debug trace with & without rL317576
ping. Anyone looking at this?
same-cost-vf.ll
(2894 bytes, text/plain)with-rL317576.out
(29025 bytes, text/plain)without-rL317576.out
(29543 bytes, text/plain)debug-diff.out
(2458 bytes, text/plain)