The default implementation of mkGlobalPointerValidityPred (here) is designed to check if you are attempting a memory write to a read-only section of the global address space, and if so, it will throw an assertion failure. I've discovered that this only works some of the time, however. To see what I mean, let's cook up a small test case that demonstrates writing to a read-only part of memory:
// test.c
int x = 0;
__attribute__((noinline)) int test_and_verify_set_x(void) {
x = 1;
return x;
}
void _start(void) {
test_and_verify_set_x();
}
Writing to x is fine on its own, but I will use objcopy to mark the ELF section that x resides in as read-only (as though x were a constant):
(I am using a musl-based, PPC32 version of gcc in the example above, but the same principle should apply to any architecture.)
Let's test this out. The easier way I know how to test this is to make the following change to macaw-symbolic:
diff --git a/symbolic/src/Data/Macaw/Symbolic/Testing.hs b/symbolic/src/Data/Macaw/Symbolic/Testing.hs
index 6203e531..85d547aa 100644
--- a/symbolic/src/Data/Macaw/Symbolic/Testing.hs
+++ b/symbolic/src/Data/Macaw/Symbolic/Testing.hs
@@ -514,7 +514,7 @@ simulateFunction binfo bak execFeatures archInfo archVals halloc mmPreset g = do
let mmConf = (MSM.memModelConfig bak memPtrTbl)
{ MS.lookupFunctionHandle = lookupFunction archVals binfo
, MS.lookupSyscallHandle = lookupSyscall
- , MS.mkGlobalPointerValidityAssertion = validityCheck
+ -- , MS.mkGlobalPointerValidityAssertion = validityCheck
}
pure (initMem, mmConf)
LazyMemModel -> do
@@ -524,7 +524,7 @@ simulateFunction binfo bak execFeatures archInfo archVals halloc mmPreset g = do
let mmConf = (MSMLazy.memModelConfig bak memPtrTbl)
{ MS.lookupFunctionHandle = lookupFunction archVals binfo
, MS.lookupSyscallHandle = lookupSyscall
- , MS.mkGlobalPointerValidityAssertion = validityCheck
+ -- , MS.mkGlobalPointerValidityAssertion = validityCheck
}
pure (initMem, mmConf)
(stackBasePtr, mem1) <- CLM.doMalloc bak CLM.StackAlloc CLM.Mutable "stack_alloc" initMem stackSize CLD.noAlignment
Then compile macaw-{ppc,x86}-symbolic, and then run this test harness program (based on the macaw-{ppc,x86}-symbolic test suites):
```hs
-- Bug.hs
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE ImplicitParams #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE TypeOperators #-}
module Main (main) where
import Control.Lens ((^.))
import qualified Data.ByteString as BS
import qualified Data.ByteString.Char8 as BS8
import qualified Data.ElfEdit as Elf
import qualified Data.Foldable as F
import qualified Data.Map as Map
import Data.Maybe ( mapMaybe )
import qualified Data.Parameterized.Classes as PC
import qualified Data.Parameterized.Nonce as PN
import Data.Parameterized.Some ( Some(..) )
import Data.Proxy ( Proxy(..) )
import GHC.TypeLits ( KnownNat )
import qualified What4.Config as WC
import qualified What4.Expr as WE
import qualified What4.Expr.Builder as WEB
import qualified What4.FloatMode as WF
import qualified What4.Interface as WI
import qualified What4.ProblemFeatures as WPF
import qualified What4.Solver as WS
import qualified Lang.Crucible.Backend as CB
import qualified Lang.Crucible.Backend.Online as CBO
import qualified Lang.Crucible.Simulator as CS
import qualified Lang.Crucible.LLVM.MemModel as LLVM
import qualified Dismantle.PPC as DP
import qualified SemMC.Architecture.PPC as SAP
import qualified Data.Macaw.CFG as MC
import qualified Data.Macaw.Discovery as M
import qualified Data.Macaw.Memory as MM
import qualified Data.Macaw.Memory.ElfLoader.PLTStubs as MMEP
import qualified Data.Macaw.Symbolic as MS
import qualified Data.Macaw.Symbolic.Testing as MST
import qualified Data.Macaw.PPC as MP
import Data.Macaw.PPC.Symbolic ()
import qualified Data.Macaw.X86 as MX
import Data.Macaw.X86.Symbolic ()
main :: IO ()
main = do
-- These are pass/fail in that the assertions in the "pass" set are true (and
-- the solver returns Unsat), while the assertions in the "fail" set are false
-- (and the solver returns Sat).
let passRes = MST.SimulationResult MST.Unsat
mkSymExTest passRes MST.LazyMemModel "test.exe"
hasTestPrefix :: Some (M.DiscoveryFunInfo arch) -> Maybe (BS8.ByteString, Some (M.DiscoveryFunInfo arch))
hasTestPrefix (Some dfi) = do
bsname <- M.discoveredFunSymbol dfi
if BS8.pack "test_" `BS8.isPrefixOf` bsname
then return (bsname, Some dfi)
else Nothing
mkSymExTest :: MST.SimulationResult -> MST.MemModelPreset -> FilePath -> IO ()
mkSymExTest expected mmPreset exePath = do
bytes <- BS.readFile exePath
case Elf.decodeElfHeaderInfo bytes of
Left (_, msg) -> fail ("Error parsing ELF header from file '" ++ show exePath ++ "': " ++ msg)
Right (Elf.SomeElf ehi) -> do
case (Elf.headerClass (Elf.header ehi), Elf.headerMachine (Elf.header ehi)) of
(Elf.ELFCLASS64, Elf.EM_X86_64) -> do
binariesInfo <- MST.runDiscovery ehi exePath MST.toAddrSymMap MX.x86_64_linux_info MX.x86_64PLTStubInfo
let funInfos = Map.elems (MST.binaryDiscState (MST.mainBinaryInfo binariesInfo) ^. M.funInfo)
let testEntryPoints = mapMaybe hasTestPrefix funInfos
F.forM_ testEntryPoints $ \(_name, Some dfi) -> do
Some (gen :: PN.NonceGenerator IO t) <- PN.newIONonceGenerator
sym <- WEB.newExprBuilder WF.FloatRealRepr WE.EmptyExprBuilderState gen
CBO.withYicesOnlineBackend sym CBO.NoUnsatFeatures WPF.noFeatures $ \bak -> do
-- We are using the z3 backend to discharge proof obligations, so
-- we need to add its options to the backend configuration
let solver = WS.z3Adapter
let backendConf = WI.getConfiguration sym
WC.extendConfig (WS.solver_adapter_config_options solver) backendConf
execFeatures <- MST.defaultExecFeatures (MST.SomeOnlineBackend bak)
archVals <- case MS.archVals (Proxy @MX.X86_64) Nothing of
Just archVals -> pure archVals
Nothing -> error "mkSymExTest: impossible"
let extract = x86ResultExtractor archVals
let ?memOpts = LLVM.defaultMemOptions
simRes <- MST.simulateAndVerify solver WS.defaultLogData bak execFeatures MX.x86_64_linux_info archVals binariesInfo mmPreset extract dfi
if expected == simRes
then putStrLn "Test passed"
else fail $ "Expected " ++ show expected ++ ", but got " ++ show simRes
(Elf.ELFCLASS32, Elf.EM_PPC) -> do
binariesInfo <- MST.runDiscovery ehi exePath MST.toAddrSymMap MP.ppc32_linux_info (MMEP.noPLTStubInfo "PPC")
let funInfos = Map.elems (MST.binaryDiscState (MST.mainBinaryInfo binariesInfo) ^. M.funInfo)
let testEntryPoints = mapMaybe hasTestPrefix funInfos
F.forM_ testEntryPoints $ \(_name, Some dfi) -> do
Some (gen :: PN.NonceGenerator IO t) <- PN.newIONonceGenerator
sym <- WEB.newExprBuilder WF.FloatRealRepr WE.EmptyExprBuilderState gen
CBO.withYicesOnlineBackend sym CBO.NoUnsatFeatures WPF.noFeatures $ \bak -> do
-- We are using the z3 backend to discharge proof obligations, so
-- we need to add its options to the backend configuration
let solver = WS.z3Adapter
let backendConf = WI.getConfiguration sym
WC.extendConfig (WS.solver_adapter_config_options solver) backendConf
execFeatures <- MST.defaultExecFeatures (MST.SomeOnlineBackend bak)
archVals <- case MS.archVals (Proxy @MP.PPC32) Nothing of
Just archVals -> pure archVals
Nothing -> error "mkSymExTest: impossible"
let extract = ppcResultExtractor archVals
let ?memOpts = LLVM.defaultMemOptions
simRes <- MST.simulateAndVerify solver WS.defaultLogData bak execFeatures MP.ppc32_linux_info archVals binariesInfo mmPreset extract dfi
if expected == simRes
then putStrLn "Test passed"
else fail $ "Expected " ++ show expected ++ ", but got " ++ show simRes
_ -> error "Unsupported architecture"
x86ResultExtractor :: (CB.IsSymInterface sym) => MS.ArchVals MX.X86_64 -> MST.ResultExtractor sym MX.X86_64
x86ResultExtractor archVals = MST.ResultExtractor $ \regs _sp _mem k -> do
let re = MS.lookupReg archVals regs MX.RAX
k PC.knownRepr (CS.regValue re)
ppcResultExtractor :: ( arch ~ MP.AnyPPC v
, CB.IsSymInterface sym
, MP.KnownVariant v
, MM.MemWidth (SAP.AddrWidth v)
, MC.ArchConstraints arch
, MS.ArchInfo arch
, KnownNat (SAP.AddrWidth v)
)
=> MS.ArchVals arch
-> MST.ResultExtractor sym arch
ppcResultExtractor archVals = MST.ResultExtractor $ \regs _sp _mem k -> do
let re = MS.lookupReg archVals regs (MP.PPC_GP (DP.GPR 3))
k PC.knownRepr (CS.regValue re)
```
If you do all that, you should get the following error:
$ ~/Software/powerpc-linux-musl-cross/bin/powerpc-linux-musl-gcc -nostdlib -fno-stack-protector -fcf-protection=none test.c -o test.exe && ~/Software/powerpc-linux-musl-cross/bin/powerpc-linux-musl-objcopy --writable-text --set-section-flags .sbss=readonly test.exe
$ runghc Bug.hs
Bug.hs: Failed to prove goal: test.exe: segment1+0x1e0: error: in test_and_verify_set_x
PointerWrite outside of static memory range (known BlockID 0): let -- test.exe: segment1+0x1d8
v442 = select cglobalMemoryBytes@1:a 0x1ff6b:[32]
-- test.exe: segment1+0x1d8
v447 = bvConcat (select cglobalMemoryBytes@1:a 0x1ff6a:[32]) v442
-- test.exe: segment1+0x1d8
v452 = bvConcat (select cglobalMemoryBytes@1:a 0x1ff69:[32]) v447
in bvConcat (select cglobalMemoryBytes@1:a 0x1ff68:[32]) v452
CallStack (from HasCallStack):
error, called at src/Data/Macaw/Symbolic/Testing.hs:338:33 in macaw-symbolic-0.0.1-inplace:Data.Macaw.Symbolic.Testing
So far, so good.
The example above used PPC32, however. What happens if we use an x86-64 version of gcc instead?
Note that this will only attempt to emit an outside of static memory range assertion failure if the LLVMPtr representing the global address has a block number of zero. If it has a non-zero block number, then there is no chance of an assertion failure. And sure enough, if we add some additional debug-printing to this function:
diff --git a/symbolic/src/Data/Macaw/Symbolic/Memory/Common.hs b/symbolic/src/Data/Macaw/Symbolic/Memory/Common.hs
index d72ef96d..ac3094d6 100644
--- a/symbolic/src/Data/Macaw/Symbolic/Memory/Common.hs
+++ b/symbolic/src/Data/Macaw/Symbolic/Memory/Common.hs
@@ -167,6 +167,10 @@ mkGlobalPointerValidityPredCommon tbl = \sym puse mcond ptr -> do
let ptrVal = CS.regValue ptr
let (ptrBase, ptrOff) = CL.llvmPointerView ptrVal
+ putStrLn $ unlines
+ [ "RGS mkGlobalPointerValidityPredCommon"
+ , show $ CL.ppPtr ptrVal
+ ]
case WI.asNat ptrBase of
Just 0 -> do
p <- mkPred ptrOff
And then if we re-run the x86-64 version of the program, we see:
In the x86-64 version of the binary, 0x4000 is x's address. Here, we see that the block number is 1, so macaw-symbolic won't check if the offset is out of range.
On the other hand, if we run the PPC32 version of the program, we see:
$ runghc Bug.hs
<snip>
RGS mkGlobalPointerValidityPredCommon
let -- test.exe: segment1+0x1d8
v442 = select cglobalMemoryBytes@1:a 0x1ff6b:[32]
-- test.exe: segment1+0x1d8
v447 = bvConcat (select cglobalMemoryBytes@1:a 0x1ff6a:[32]) v442
-- test.exe: segment1+0x1d8
v452 = bvConcat (select cglobalMemoryBytes@1:a 0x1ff69:[32]) v447
in bvConcat (select cglobalMemoryBytes@1:a 0x1ff68:[32]) v452
<snip>
Bug.hs: Failed to prove goal: test.exe: segment1+0x1e0: error: in test_and_verify_set_x
PointerWrite outside of static memory range (known BlockID 0): let -- test.exe: segment1+0x1d8
v442 = select cglobalMemoryBytes@1:a 0x1ff6b:[32]
-- test.exe: segment1+0x1d8
v447 = bvConcat (select cglobalMemoryBytes@1:a 0x1ff6a:[32]) v442
-- test.exe: segment1+0x1d8
v452 = bvConcat (select cglobalMemoryBytes@1:a 0x1ff69:[32]) v447
in bvConcat (select cglobalMemoryBytes@1:a 0x1ff68:[32]) v452
CallStack (from HasCallStack):
error, called at src/Data/Macaw/Symbolic/Testing.hs:338:33 in macaw-symbolic-0.0.1-inplace:Data.Macaw.Symbolic.Testing
This time, we see that the block number is 0 (by virtue of the fact that it's not printing a pair).
What is causing this discrepancy? Ultimately, it's because when simulating the x86-64 version of the program, macaw-symbolic attempts to write to an address that was resolved using MacawGlobalPtr. This will translate an absolute address (e.g., 0x4000 with a block number of 0) into an offset from the LLVMPtr that backs global memory (e.g., (1, 0x4000)). When simulating the PPC32 version of the program, however, macaw-symbolic attempts to write to an absolute address (0x2000) that was not resolved using MacawGlobalPtr. (I'm very unclear on why the two architectures differ in this way.)
What should we do about this? We could investigate why the two architectures differ here, although I'm unsure how deep that rabbit hole goes. In principle, mkGlobalPointerValidityPredCommon ought to be able to handle addresses that were resolved using MacawGlobalPtr, so perhaps we should teach it to also make assertions about LLVMPtrs whose block number matches that of the LLVMPtr backing global memory.
The default implementation of
mkGlobalPointerValidityPred
(here) is designed to check if you are attempting a memory write to a read-only section of the global address space, and if so, it will throw an assertion failure. I've discovered that this only works some of the time, however. To see what I mean, let's cook up a small test case that demonstrates writing to a read-only part of memory:Writing to
x
is fine on its own, but I will useobjcopy
to mark the ELF section thatx
resides in as read-only (as thoughx
were a constant):(I am using a
musl
-based, PPC32 version ofgcc
in the example above, but the same principle should apply to any architecture.)Let's test this out. The easier way I know how to test this is to make the following change to
macaw-symbolic
:Then compile
macaw-{ppc,x86}-symbolic
, and then run this test harness program (based on themacaw-{ppc,x86}-symbolic
test suites):If you do all that, you should get the following error:
So far, so good.
The example above used PPC32, however. What happens if we use an x86-64 version of
gcc
instead?Huh? How come the x86-64 version passes the test, but the PPC32 version fails? There is a hint in this part of
mkGlobalPointerValidityPredCommon
:https://github.com/GaloisInc/macaw/blob/3e83b3eff3067c151f65f44c1e64021e36ad935a/symbolic/src/Data/Macaw/Symbolic/Memory/Common.hs#L168-L177
Note that this will only attempt to emit an
outside of static memory range
assertion failure if theLLVMPtr
representing the global address has a block number of zero. If it has a non-zero block number, then there is no chance of an assertion failure. And sure enough, if we add some additional debug-printing to this function:And then if we re-run the x86-64 version of the program, we see:
In the
x86-64
version of the binary, 0x4000 isx
's address. Here, we see that the block number is 1, somacaw-symbolic
won't check if the offset is out of range.On the other hand, if we run the PPC32 version of the program, we see:
This time, we see that the block number is 0 (by virtue of the fact that it's not printing a pair).
What is causing this discrepancy? Ultimately, it's because when simulating the x86-64 version of the program,
macaw-symbolic
attempts to write to an address that was resolved usingMacawGlobalPtr
. This will translate an absolute address (e.g.,0x4000
with a block number of 0) into an offset from theLLVMPtr
that backs global memory (e.g.,(1, 0x4000)
). When simulating the PPC32 version of the program, however,macaw-symbolic
attempts to write to an absolute address (0x2000
) that was not resolved usingMacawGlobalPtr
. (I'm very unclear on why the two architectures differ in this way.)What should we do about this? We could investigate why the two architectures differ here, although I'm unsure how deep that rabbit hole goes. In principle,
mkGlobalPointerValidityPredCommon
ought to be able to handle addresses that were resolved usingMacawGlobalPtr
, so perhaps we should teach it to also make assertions aboutLLVMPtr
s whose block number matches that of theLLVMPtr
backing global memory.