Note that this calls the s.fn function pointer using the call *0x10(%rsp) instruction (at address 0x113d). This will end up being important later.
Now let's load the callit function into a macaw CFG using the following driver program:
```hs
-- Main.hs
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE OverloadedStrings #-}
module Main (main) where
import Control.Monad (when)
import qualified Data.ByteString as BS
import qualified Data.ElfEdit as EE
import qualified Data.Map as M
import Data.Parameterized.Some (Some(..))
import qualified Prettyprinter as PP
import qualified Data.Macaw.BinaryLoader.ELF as MBE
import qualified Data.Macaw.CFG as MC
import qualified Data.Macaw.Discovery as MD
import qualified Data.Macaw.Memory as MM
import qualified Data.Macaw.Memory.ElfLoader as MME
import qualified Data.Macaw.Utils.IncComp as MUI
import qualified Data.Macaw.X86 as MX
main :: IO ()
main = do
bytes <- BS.readFile "test.exe"
case EE.decodeElfHeaderInfo bytes of
Left (_off, msg) -> fail msg
Right (EE.SomeElf e) -> do
EE.ELFCLASS64 <- pure $ EE.headerClass $ EE.header e
(warn, mem, _mentry, syms) <-
case MME.resolveElfContents options e of
Left err -> error err
Right r -> pure r
when (not (null warn)) $ do
error $ "Warnings while loading Elf " ++ show warn
let addrSymMap :: M.Map (MM.MemSegmentOff 64) BS.ByteString
addrSymMap = M.fromList [ (MME.memSymbolStart sym, MME.memSymbolName sym)
| sym <- syms
]
-- `callit`'s address
Just addrOff <- pure $ MBE.resolveAbsoluteAddress mem
0x1130
Some dfi <- discoverFunction MX.x86_64_linux_info mem addrSymMap addrOff
print $ PP.pretty dfi
where
options = MME.LoadOptions { MME.loadOffset = Just 0 }
discoverFunction ::
MX.ArchitectureInfo arch ->
MM.Memory (MC.ArchAddrWidth arch) ->
MD.AddrSymMap (MC.ArchAddrWidth arch) ->
MC.ArchSegmentOff arch ->
IO (Some (MD.DiscoveryFunInfo arch))
discoverFunction archInf mem symMap addr =
let s = MD.emptyDiscoveryState mem symMap archInf in
MX.withArchConstraints archInf $
MUI.processIncCompLogs (const $ pure ()) $ MUI.runIncCompM $ do
let discoveryOpts = MD.defaultDiscoveryOptions
(_, funInfo) <- MUI.liftIncComp id $
MD.discoverFunction discoveryOpts addr MD.UserRequest s []
pure funInfo
```
The resulting macaw CFG is deeply suspicious, however:
In particular, pay attention to the parts of the CFG leading up to address 0x113d. We would expect that just before the call and return to ... bit, the value of rip should be an address derived from %rsi (which we move to 0x10(%rsp) just before calling it). That's not what happens in the macaw CFG, however: instead, the value of rip just before the call is r29, which corresponds to a value derived from %rdi instead! This is quite wrong, and in a downstream application of mine that depends on macaw-x86, this results in macaw-symbolic jumping to a non-sensical address.
The root cause is a bug in macaw-x86's semantics. Currently, we have:
def_call :: InstructionDef
def_call = defUnary "call" $ \_ v -> do
-- Push value of next instruction
old_pc <- getReg R.X86_IP
push addrRepr old_pc
-- Set IP
tgt <- getCallTarget v
rip .= tgt
Note that we push the next instruction to the top of the stack before we get the call target. In this case, however, the call target is 0x10(%rsp), which means that pushing the next instruction will change the call target! That seems like a problem, and some searching reveals that radare2used to suffer from the same bug.
I believe the following patch would fix this:
diff --git a/x86/src/Data/Macaw/X86/Semantics.hs b/x86/src/Data/Macaw/X86/Semantics.hs
index 370567a3..e923cd38 100644
--- a/x86/src/Data/Macaw/X86/Semantics.hs
+++ b/x86/src/Data/Macaw/X86/Semantics.hs
@@ -1376,11 +1376,12 @@ def_set_list =
def_call :: InstructionDef
def_call = defUnary "call" $ \_ v -> do
+ -- Get the address to branch to
+ tgt <- getCallTarget v
-- Push value of next instruction
old_pc <- getReg R.X86_IP
push addrRepr old_pc
-- Set IP
- tgt <- getCallTarget v
rip .= tgt
-- | Conditional jumps
This ensures that we retrieve the call target before pushing the next instruction, which may influence the data that %rsp points to.
This C code:
When compiled like so:
Yields an x86-64 binary with the following machine code for the
callit
function:Note that this calls the
s.fn
function pointer using thecall *0x10(%rsp)
instruction (at address0x113d
). This will end up being important later.Now let's load the
callit
function into amacaw
CFG using the following driver program:The resulting
macaw
CFG is deeply suspicious, however:In particular, pay attention to the parts of the CFG leading up to address
0x113d
. We would expect that just before thecall and return to ...
bit, the value ofrip
should be an address derived from%rsi
(which we move to0x10(%rsp)
just beforecall
ing it). That's not what happens in themacaw
CFG, however: instead, the value ofrip
just before thecall
isr29
, which corresponds to a value derived from%rdi
instead! This is quite wrong, and in a downstream application of mine that depends onmacaw-x86
, this results inmacaw-symbolic
jumping to a non-sensical address.The root cause is a bug in
macaw-x86
's semantics. Currently, we have:Note that we push the next instruction to the top of the stack before we get the call target. In this case, however, the call target is
0x10(%rsp)
, which means that pushing the next instruction will change the call target! That seems like a problem, and some searching reveals thatradare2
used to suffer from the same bug.I believe the following patch would fix this:
This ensures that we retrieve the call target before pushing the next instruction, which may influence the data that
%rsp
points to.