Closed 0xd4d closed 4 years ago
Wow, impressive & valuable work! Do you have a script, or a tool which runs these tests automatically? The issues you identified should be fixed in the latest version. Here are the comments for each identified issue:
imm8/32 should be sign extended Indeed, the immediate was logged with its raw size. However, when accessing the immediate operand inside the instrux, it is reported with a correct size. I modified the NdToText function to log the sign-extended immediate with their logical size, not the raw size.
wrong target offset (66 doesn't truncate target address, it just affects the size of the relative offset) Nice catch. Fixed.
wrong order of operands Nice catch. Fixed.
can't decode vmmcall Nice catch. Fixed.
can't decode wbinvd This one appeared due to WBNOINVD, which expects 0xF3 prefix. WBINVD does seem to execute with 0x66 and 0xF2 prefixes as well.
wrong mnemonic Nice catch. Fixed.
wrong addr FaD. When using the address size override in 64 bit mode, it only demotes the register size from 64 to 32 bits. RIP relative addressing works normally even if 0x67 prefix is used. This is what Xed decodes, and this what the actual hardware does.
mpx instructions ignore 67 in 64-bit mode and in 16/32-bit mode must use 32-bit addressing Excellent catch. Fixed.
missing fxsave64/fxrstor64 Fixed mnemonics.
wrong mem size Fixed.
wrong instr Fixed.
can't decode All fixed. The VEX.L field was fixed to 0 for them, although the SDM states that it is ignored.
decoded with an extra byte. Also wrong mnemonic. Fixed.
Wrong reg in one of the last operands Fixed. The L operand encoding did not ignore the MSB if outside 64 bit.
wrong mem size Fixed. The textual disassembly showed the total accessed memory size instead of the element size if VSIB was being used.
can't decode Fixed.
wrong mnemonic Fixed.
can't decode Note that according to the SDM, encoding these instructions with embedded rounding is ignored. Although bddisasm did fail to decode them initially, the output you provided also seems to be wrong, as it erroneously promotes the vector length to 512 bits. I don't have hardware to actually run these and see what is actually happening; do you have a supporting CPU?
wrong disp8 Wrong tuple type for instructions. Fixed.
can't decode Missing broadcast support for instructions. Fixed.
can't decode. It's not #UD because it's a scatter (not a gather) instruction. Indeed, scatter instructions can use the VSIB reg as two sources. Fixed.
wrong mem size Same as other VSIB isntruction, I preffered using the full memory size in the textual disassembly. Now using only the element size.
wrong displ Fixed.
these should have a 'd' suffix Fixed.
can't decode Fixed.
xbegin + 66 doesn't truncate the address, it just controls the size of the rel value Correct, fixed.
this is syscall/sysret You are right, SYSCALL can be encoded & executed in 32 bit mode, even if the SDM states that it is invalid outside 64 bit mode.
wrong mnemonic Fixed.
rdpid op is 32-bit in 16/32-bit mode, and 64-bit in 64-bit mode Fixed.
can't decode Fixed.
can't decode The SDM states that encodings which use vex.vvvv >= 8 are "invalid". Encoding and running such an instruction proves the contrary though - the CPU simply ignores the 3rd vex.vvvv bit. Fixed.
FaD. When using the address size override in 64 bit mode, it only demotes the register size from 64 to 32 bits. RIP relative addressing works normally even if 0x67 prefix is used. This is what Xed decodes, and this what the actual hardware does.
A 67 prefix truncates the address to 32 bits in 64-bit mode, so in effect it uses EIP instead of RIP. It's no difference than when 67 selects EBX instead of RBX. The result should be an address with the upper 32 bits cleared. See 2.2.1.6 in SDM vol 2, last section.
Note that according to the SDM, encoding these instructions with embedded rounding is ignored. Although bddisasm did fail to decode them initially, the output you provided also seems to be wrong, as it erroneously promotes the vector length to 512 bits. I don't have hardware to actually run these and see what is actually happening; do you have a supporting CPU?
No I haven't tested real HW, this CPU doesn't have AVX-512 instructions.
The SDM says {er} is ignored (these bits aren't used at all), nothing else seems to change. It has no other bits left to differentiate which instruction to decode (128, 256 or 512 bits) so it must use only one of them which is the 512-bit version, which is also the only one that can use {er}. See also table 2-36 in SDM vol 2.
A 67 prefix truncates the address to 32 bits in 64-bit mode, so in effect it uses EIP instead of RIP. It's no difference than when 67 selects EBX instead of RBX. The result should be an address with the upper 32 bits cleared. See 2.2.1.6 in SDM vol 2, last section.
Ah, yes, indeed, I see now that I do not truncate the rel value to 32 bit in the textal output. This is an output/text only bug, though. Thanks for insisting on it.
The SDM says {er} is ignored (these bits aren't used at all), nothing else seems to change. It has no other bits left to differentiate which instruction to decode (128, 256 or 512 bits) so it must use only one of them which is the 512-bit version, which is also the only one that can use {er}. See also table 2-36 in SDM vol 2.
The SDM also describes these instructions without er/sae support, so there's no point in interpreting the evex.b bit at all in this case, since it says that it's ignored (in fact, Xed doesn't seem to decode the instructions at all, which right now I'm not sure it's the correct way). My interpretation is that the instruction will be decoded as if the evex.b bit is 0, but the ambiguity around it should be cleared by running it on supporting hardware. As we see it, there are three posibilities:
XED doesn't decode those two examples yet because I just reported it. It does support the other 2 (of 4) instructions that ignore {er}. bddisasm fails to decode when LL=3.
LL=0 62E10F182AD3 vcvtsi2sd xmm18, xmm14, ebx
LL=1 62714F302AD3 vcvtsi2sd xmm10, xmm22, ebx
LL=2 62D14F582AD3 vcvtsi2sd xmm2, xmm6, r11d
LL=3 62F14F782AD3 vcvtsi2sd xmm2, xmm6, ebx
LL=0 62E10F187BD3 vcvtusi2sd xmm18, xmm14, ebx
LL=1 62714F307BD3 vcvtusi2sd xmm10, xmm22, ebx
LL=2 62D14F587BD3 vcvtusi2sd xmm2, xmm6, r11d
LL=3 62F14F787BD3 vcvtusi2sd xmm2, xmm6, ebx
Handling LL=3 for the ER ignored instructions has the interesting (but obvious) side-effect of decoding the instructions you previously mentioned (since the LL field becomes fixed - 128 or 512 bit, depending on vector/tuple). I'll leave it like this for now, at least until I get to run it on an actual CPU, and see what it actually does. Thanks for the reports once again! Waiting for any other feedback you might have on the matter!
Some bugs I found when testing valid instructions
+ green
is bddisasm64-bit code
32-bit code
There could be more bugs but there are too many diffs due to the above bug.