Several instructions involving loop semantics are missing check_Reg operations when 32-bit registers are modified in long mode. This PR adds an appropriate check_Reg to all the places where 32-bits register are modified. (Here several commits have been merged into a single PR, because the fix is essentially the same for each issue).
For string operations, EDI and ESI need to be zeroed when the address size prefix (67) is set:
67a5MOVSD ES:EDI,ESI with RDI=0xaaaaaaaa00000000, RSI=0xaaaaaaaa00001000, mem[0x1000]=44332211
After the PR, there is still a minor discrepancy with Intel CPUs. Even when ECX=0 (no loop iterations), the test Intel CPU zeroes the upper bits of RAX. AMD's behavior makes more logical sense, for other instructions in long mode, zeroing the upper bits only occur when the register is used as a destination of an operation, and according to the pseudocode listed for REP instructions† RCX is not written to if it is zero, however this is what is observed running the instruction on real hardware:
67f3af SCASD.REPE ES:EDI with RDI=0xaaaaaaaa_00000000, RCX=0xaaaaaaaa_00000000
AMD CPU: { RDI=0xaaaaaaaa_00000000, RCX=0xaaaaaaaa_00000000 } (AMD CPUs behave the same as the PR)
Several instructions involving loop semantics are missing
check_Reg
operations when 32-bit registers are modified in long mode. This PR adds an appropriatecheck_Reg
to all the places where 32-bits register are modified. (Here several commits have been merged into a single PR, because the fix is essentially the same for each issue).For string operations,
EDI
andESI
need to be zeroed when the address size prefix (67) is set:67a5
MOVSD ES:EDI,ESI
withRDI=0xaaaaaaaa00000000
,RSI=0xaaaaaaaa00001000
,mem[0x1000]=44332211
RDI=0x4
,RSI=0x1004
,mem[0x0]=44332211
}x86:LE:64:default
(Existing):"MOVSD ES:EDI,ESI"
{RDI=0xaaaaaaaa00000004
,RSI=0xaaaaaaaa00001004
,mem[0x0]=44332211
}x86:LE:64:default
(This patch):"MOVSD ES:EDI,ESI"
{RDI=0x4
,RSI=0x1004
,mem[0x0]=44332211
}For
REP
instructions,ECX
is used when the address size prefix (67) is set and needs to be zeroed:67f3af
SCASD.REPE ES:EDI
withRDI=0xaaaaaaaa_00000000
,RCX=0xaaaaaaaa_00000001
CF=0x1
,RCX=0x0
,RDI=0x4
}x86:LE:64:default
(Existing):"SCASD.REPE ES:EDI"
{CF=0x1
,RCX=0xaaaaaaaa00000000
,RDI=0xaaaaaaaa00000004
}x86:LE:64:default
(This patch):"SCASD.REPE ES:EDI"
{CF=0x1
,RCX=0x0
,RDI=0x4
}Essentially, identical differences apply to all string instructions with repeat ops, e.g.:
67f3ac
"LODSB.REP ESI"
67f3ad
"LODSD.REP ESI"
67f3a6
"CMPSB.REPE ES:EDI,ESI"
A similar change is needed when the 67 prefix is present on
LOOP
instructions:67e2xx
"LOOP [addr]"
Also includes conditional
LOOP
instructions:67e1xx
"LOOPZ [addr]"
67e0xx
"LOOPNZ [addr]"
The
LODSD
instruction (no prefix required), writes toEAX
and also requires a fix:ad
LODSD RSI
withRSI=0x1000
,mem[0x1000]=44332211
,RAX=0xaaaaaaaa00000000
RAX=0x11223344
,RSI=0x1004
}x86:LE:64:default
(Existing):"LODSD RSI"
{RAX=0xaaaaaaaa11223344
,RSI=0x1004
}x86:LE:64:default
(This patch):"LODSD RSI"
{RAX=0x11223344
,RSI=0x1004
}After the PR, there is still a minor discrepancy with Intel CPUs. Even when
ECX=0
(no loop iterations), the test Intel CPU zeroes the upper bits ofRAX
. AMD's behavior makes more logical sense, for other instructions in long mode, zeroing the upper bits only occur when the register is used as a destination of an operation, and according to the pseudocode listed for REP instructions†RCX
is not written to if it is zero, however this is what is observed running the instruction on real hardware:SCASD.REPE ES:EDI
withRDI=0xaaaaaaaa_00000000
,RCX=0xaaaaaaaa_00000000
RDI=0xaaaaaaaa_00000000
,RCX=0xaaaaaaaa_00000000
} (AMD CPUs behave the same as the PR)RDI=0xaaaaaaaa_00000000
,RCX=0x00000000_00000000
}x86:LE:64:default
(unchanged):"SCASD.REPE ES:EDI"
{RDI=0xaaaaaaaa_00000000
,RCX=0xaaaaaaaa_00000000
}†Intel SDM Vol.2B "REP/REPE/REPZ/REPNE/REPNZ—Repeat String Operation Prefix"