andreas-abel / uiCA

uops.info Code Analyzer
GNU Affero General Public License v3.0
238 stars 16 forks source link

Simulation inaccuracy for 256-bit stores on SNB (remaining part of #15) #16

Closed amonakov closed 2 years ago

amonakov commented 2 years ago

The fix for issue #15 did not include a correction for 256-bit stores: like loads, they have half throughput compared to their 128-bit SSE counterparts, and the following loop runs at two cycles per iteration:

loop:
vmovaps [rdi], ymm0
dec ecx
jnz loop