GaloisInc / flexdis86

A library for disassembling x86-64 binaries.
BSD 3-Clause "New" or "Revised" License
37 stars 10 forks source link

`push`'s operand should sign-extend immediate and segment-register operands #35

Closed RyanGlScott closed 2 years ago

RyanGlScott commented 2 years ago

If you run DumpInstr on a push instruction with an immediate argument, you'll get something like this:

$ cabal run DumpInstr -- 6a 00
II {iiLockPrefix = NoLockPrefix, iiAddrSize = Size64, iiOp = "push", iiArgs = [(ByteImm 0,OpType ImmediateSource BSize)], iiPrefixes = Prefixes {_prLockPrefix = NoLockPrefix, _prSP = SegmentPrefix {unwrapSegmentPrefix = 0}, _prREX = 0b00000000, _prVEX = Nothing, _prASO = False, _prOSO = False}, iiRequiredPrefix = Nothing, iiOpcode = [106], iiRequiredMod = Nothing, iiRequiredReg = Nothing, iiRequiredRM = Nothing}
push   0x0

Notice that while the address size (and in this case, the operand size as well) iiAddrSize is Size64 (8 bytes), while the operand is ByteImm 0 (only 1 byte). This is a bit dubious. Per the ISA manual:

If the source operand is an immediate of size less than the operand size, a sign-extended value is pushed on the stack. If the source operand is a segment register (16 bits) and the operand size is 64-bits, a zero-extended value is pushed on the stack; if the operand size is 32-bits, either a zero-extended value is pushed on the stack or the segment selector is written on the stack using a 16-bit move. For the last case, all recent Core and Atom processors perform a 16-bit move, leaving the upper portion of the stack location unmodified.

Since push sign-extends the operand in some cases, ideally we would indicate this in the InstructionInstance itself. In the example above, for instance, we should be using ByteSignedImm instead of ByteImm. Fixing this may be as simple as editing the parts of optable.xml that pertain to push.

RyanGlScott commented 2 years ago

I thought that this would suffice to handle the cases for immediates:

diff --git a/data/optable.xml b/data/optable.xml
index 7d02fba..c5facfc 100644
--- a/data/optable.xml
+++ b/data/optable.xml
@@ -6982,7 +6982,7 @@
         <def>
             <pfx>oso</pfx>
             <opc>68</opc>
-            <opr>Iz</opr>
+            <opr>sIz</opr>
             <mode>def64</mode>
         </def>
         <def>
@@ -6994,7 +6994,7 @@
         <def>
             <pfx>oso</pfx>
             <opc>6a</opc>
-            <opr>Ib</opr>
+            <opr>sIb</opr>
             <mode>def64</mode>
         </def>
     </instruction>
diff --git a/tests/Assemble.hs b/tests/Assemble.hs
index ace04f5..e5a2179 100644
--- a/tests/Assemble.hs
+++ b/tests/Assemble.hs
@@ -31,9 +31,9 @@ j20 = [D.JumpOffset D.JSize8 (D.FixedOffset (20 - 2))]
 testCases :: [(AsmFlavor, String, Maybe D.InstructionInstance)]
 testCases = [ (Att, "ret", mkI "ret" [])
             , (Att, "int $0x3", mkI "int3" [])
-            , (Att, "push $0x8", mkI "push" [D.ByteImm 8])
-            , (Att, "pushw $0xfff", fmap setOSO $ mkI "push" [D.WordImm 0xfff])
-            , (Att, "push $0x2000000", mkI "push" [D.DWordImm (D.Imm32Concrete 0x2000000)])
+            , (Att, "push $0x8", mkI "push" [D.ByteSignedImm 8])
+            , (Att, "pushw $0xfff", fmap setOSO $ mkI "push" [D.WordSignedImm 0xfff])
+            , (Att, "push $0x2000000", mkI "push" [D.DWordSignedImm 0x2000000])
               -- The subtraction here is gross, but required because
               -- the jump is relative to the IP, which is incremented
               -- past the jump by the time it executes.

However, that causes the (Att, "pushw $0xfff", fmap setOSO $ mkI "push" [D.WordSignedImm 0xfff]) test case to fail. Some investigation reveals that this is because matchOperandType returns False. Ideally, we would hit this case of matchOperandType:

https://github.com/GaloisInc/flexdis86/blob/6d6e8c39e754dc4c9910b1c0dc494e2eef8e309f/src/Flexdis86/Assembler.hs#L235

However, oso is False here because of this line:

https://github.com/GaloisInc/flexdis86/blob/6d6e8c39e754dc4c9910b1c0dc494e2eef8e309f/src/Flexdis86/Assembler.hs#L86-L88

Ugh. I guess this means we'll need to infer the presence of oso like we do for REX?

RyanGlScott commented 2 years ago

Fixed in #37.