andreas-abel / uiCA

uops.info Code Analyzer
GNU Affero General Public License v3.0
238 stars 16 forks source link

simulation inaccuracy: missed dep-breaking of pcmpeq #28

Open amonakov opened 1 year ago

amonakov commented 1 year ago

Integer pcmpeq* with source=dest sets destination to all-ones without dependency on source (but still occupies an execution unit). For example, the following loop runs at one cycle per iteration on Skylake, while uiCA predicts two:

loop:
vpcmpeqd xmm0, xmm0, xmm0
vpor xmm0, xmm0, xmm0
dec ecx
jnz loop