but I don't see a substantial difference in 3 → 1 vs. 2 → 1 experiments, or a difference w.r.t its vpternlogq sibling, where all latencies are listed as 1. Shouldn't both dword and qword variants be listed with latency 2 for operands 2 and 3? What am I missing?
If I'm reading Agner's testing harness right, his latency experiment times
repeated 50 times. He lists latency of ternlog on Zen 4 as 1 cycle in all cases (but if latency from second operand is indeed 2, his experiment wouldn't uncover that).
(unfortunately I do not have access to a Zen 4 machine to run more experiments)
On Zen 4, summary of vpternlogd latency experiments is given as
Latency operand 1 → 1: 1 Latency operand 2 → 1: 2 Latency operand 3 → 1: 1
https://uops.info/html-lat/ZEN4/VPTERNLOGD_ZMM_ZMM_ZMM_I8-Measurements.html
but I don't see a substantial difference in 3 → 1 vs. 2 → 1 experiments, or a difference w.r.t its vpternlogq sibling, where all latencies are listed as 1. Shouldn't both dword and qword variants be listed with latency 2 for operands 2 and 3? What am I missing?
If I'm reading Agner's testing harness right, his latency experiment times
repeated 50 times. He lists latency of ternlog on Zen 4 as 1 cycle in all cases (but if latency from second operand is indeed 2, his experiment wouldn't uncover that).
(unfortunately I do not have access to a Zen 4 machine to run more experiments)