asmjit / asmdb

Instructions database and utilities for X86/X64 and ARM (THUMB/A32/A64) architectures.
The Unlicense
328 stars 46 forks source link

confusion around x86 "and" instructions #16

Open robertmuth opened 2 years ago

robertmuth commented 2 years ago

These two seem to conflict:

["and" , "X:r32/m32, id/ud" , "MI" , "81 /4 id" , "ANY _XLock OF=0 SF=W ZF=W AF=U PF=W CF=0"], ["and" , "X:r64, ud" , "MI" , "81 /4 id" , "X64 _XLock OF=0 SF=W ZF=W AF=U PF=W CF=0"],

kobalicek commented 2 years ago

This is on purpose - you can encode 64-bit AND with unsigned immediate by not promoting the instruction to 64-bit. Then it's basically the same as the former - it's only possible when the operand is a register though.

robertmuth commented 2 years ago

Suppose I am looking at this from the perspective of a decode and I encounter a byte sequence that matches

"81 /4 id"

how do I know whether this which "rule" applies. In other words: is this a 32bit or a 64bit instructions.

Maybe this is dependent on the processor mode?

I would also expect that the "or" instruction has similar/symmetric rules but I did not see any.

kobalicek commented 2 years ago

In case of decode, you should always decode to an original instruction and consider all other aliases as just aliases. The encoder would support the alias (or not, depending on how you see it), but the decoder would always decode to a canonical representation.

OR doesn't have that capability, because it would zero extend the high part of 32-bit reg, which is what AND r64, ud does, but OR r64, ud encoded as 32-bit would essentially do (r64 | ud) & 0xFFFFFFFF

robertmuth commented 2 years ago

Ah I see. Is there a programmatic way to determine which instructions are "original" . I noticed some instructions have an AltFrom tag but that seems to be something slightly different.

robertmuth commented 2 years ago

I found another conflict:

  ["and"              , "X:eax, id/ud" , "I"       , "25 id"                        , "ANY AltForm      OF=0 SF=W ZF=W AF=U PF=W CF=0"],
  ["and"              , "X:rax, ud"  , "I"       , "25 id"                        , "X64 AltForm      OF=0 SF=W ZF=W AF=U PF=W CF=0"],

These are the only two such cases I found in the fairly large part of the tables that I process.

This is seems like an odd exception given that this pattern is not repeated with another ALU type instruction.

robertmuth commented 2 years ago

I spoke to soon. Here is another ambiguity of a slightly different flavor:

    ["movss"            , "w:xmm[31:0], xmm[31:0]"                          , "RM"      , "F3 0F 10 /r"                  , "SSE"],
    ["movss"            , "W:xmm[31:0], m32"                                , "RM"      , "F3 0F 10 /r"                  , "SSE"],

    ["movsd"            , "w:xmm[63:0], xmm[63:0]"                          , "RM"      , "F2 0F 10 /r"                  , "SSE2"],
    ["movsd"            , "W:xmm[63:0], m64"                                , "RM"      , "F2 0F 10 /r"                  , "SSE2"],
kobalicek commented 2 years ago

Can you describe what is ambiguous in movss / movsd case?

The instructions really do what is described. movss|movsd from memory clears the rest of the register, movss|movsd between registers won't (that's the W vs w). X86 is full of such little differences. You can see this also in AVX case vmovss and vmovsd - there are basically two versions of the instruction depending on whether it has a memory operand or not.

robertmuth commented 2 years ago

I see. I think the problems is that I am currently mostly focused on the decoding part while asmdb is more focused on encoding.

If I encounter F3 0F 10 xx xx ... I do not know what rule to chose based on only the bytes and the format ("RM"). This is similar to the ambiguity I reported with the "and" instructions further up.

What I have done on my side to deal with this is 1) ignore the rules for

and              , "X:rax, ud"
and"              "X:r64, ud" , 

2) change the movss/movsd rules slightly:

movss      "w:xmm[31:0], xmm[31:0]"       "RM"   =>   ......  "Rr"
movss      "w:xmm[31:0], m32"                 "RM" =>     ......  "Rm"

where r = M format restricted to reg; m = M format restricted to m

This gets rid of the ambiguity for me. Not sure if this makes sense for asmdb, though