andreas-abel / nanoBench

A tool for running small microbenchmarks on recent Intel and AMD x86 CPUs.
http://www.uops.info
GNU Affero General Public License v3.0
435 stars 55 forks source link

Missing latency entry for gathers #5

Closed travisdowns closed 4 years ago

travisdowns commented 5 years ago

You measure many latency stats for gathers which is awesome (and a very important formalization of the way we think about latency), but I think you are missing the most important one.

That is is the 2 -> 1 (address) latency but through the vector index register, not the base register. That's probably the most common latency chain you'll have in practice because it generalizes the notion of pointer chasing. That is, a loop like:

vpgatherdd ymm0,DWORD PTR [r14+ymm14*1],ymm1
vpor ymm14,ymm0,ymm0

On my SKL machine I measure the same latency (22) for this: same as for the 3->1 latency.

andreas-abel commented 4 years ago

Fixed.

travisdowns commented 4 years ago

Thanks @andreas-abel!