possible vex bug in handling segment offsets

hammerpig commented 2 years ago

Looking at an instruction using gs a 64 bit address of some memory and then reading at an ebp offset from that:

In [40]: angr.load_shellcode(b'eg\x0f\xb6m\x00','amd64').factory.block(0).pp()
0x0:    movzx   ebp, byte ptr gs:[ebp]

vex decides this memory access should be truncated to 32 bits (at 04):

In [41]: angr.load_shellcode(b'eg\x0f\xb6m\x00','amd64').factory.block(0).vex.pp()
IRSB {
   t0:Ity_I64 t1:Ity_I64 t2:Ity_I32 t3:Ity_I64 t4:Ity_I64 t5:Ity_I64 t6:Ity_I64 t7:Ity_I64 t8:Ity_I32 t9:Ity_I8 t10:Ity_I64

   00 | ------ IMark(0x0, 6, 0) ------
   01 | t4 = GET:I64(gs)
   02 | t6 = GET:I64(rbp)
   03 | t3 = Add64(t6,t4)
   04 | t2 = 64to32(t3)
   05 | t1 = 32Uto64(t2)
   06 | t9 = LDle:I8(t1)
   07 | t8 = 8Uto32(t9)
   08 | t7 = 32Uto64(t8)
   09 | PUT(rbp) = t7
   NEXT: PUT(rip) = 0x0000000000000006; Ijk_Boring
}

but Pcode doesn't:

In [42]: angr.load_shellcode(b'eg\x0f\xb6m\x00','amd64',engine=angr.engines.UberEnginePcode).factory.block(0).vex.pp()
IRSB {
   00 | ------ 00000000, 6 ------
  +00 | unique[0x600:8] = sext(EBP)
  +01 | unique[0x9c0:8] = GS_OFFSET + unique[0x600:8]
  +02 | unique[0x1760:1] = *[ram]unique[0x9c0:8]
  +03 | EBP = zext(unique[0x1760:1])
   NEXT: 6; Ijk_Boring
}

and I believe my processor doesn't either.

edmcman commented 2 years ago

3.3.7 in https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-1-manual.pdf

In the flat address space of 64-bit mode, linear addresses are equal to effective addresses because the base address is zero. In the event that FS or GS segments are used with a non-zero base, this rule does not hold. In 64- bit mode, the effective address components are added and the effective address is truncated (See for example the instruction LEA) before adding the full 64-bit segment base. The base is never truncated, regardless of addressing mode in 64-bit mode.

rhelmot commented 2 years ago

Yeah this is obviously a bug. Unfortunately, I'm without a computer for the next few weeks, so I'd appreciate it if one of you could look into a fix. Maybe first check if it's been fixed on upstream valgrind?

edmcman commented 2 years ago

I think it is coming from here: https://sourceware.org/git/?p=valgrind.git;a=blame;f=VEX/priv/guest_amd64_toIR.c;h=c6296f3987d0a0f6e49cd3245461d3565e5c9d30;hb=HEAD#l2471

angr/vex has the adjustment coming after the fs and gs logic. upstream has it before.

edmcman commented 2 years ago

This is what I get from https://github.com/edmcman/vex/tree/32bitaddrsegmentfix

In [1]: import angr                                                                                                                                                                                                                                                                                                                                   

In [2]: angr.load_shellcode(b'eg\x0f\xb6m\x00','amd64').factory.block(0).vex.pp()                                                                                                                                                                                                                                                                     
IRSB {
   t0:Ity_I64 t1:Ity_I64 t2:Ity_I64 t3:Ity_I64 t4:Ity_I32 t5:Ity_I64 t6:Ity_I64 t7:Ity_I64 t8:Ity_I32 t9:Ity_I8 t10:Ity_I64

   00 | ------ IMark(0x0, 6, 0) ------
   01 | t2 = GET:I64(gs)
   02 | t6 = GET:I64(rbp)
   03 | t4 = 64to32(t6)
   04 | t3 = 32Uto64(t4)
   05 | t1 = Add64(t3,t2)
   06 | t9 = LDle:I8(t1)
   07 | t8 = 8Uto32(t9)
   08 | t7 = 32Uto64(t8)
   09 | PUT(rbp) = t7
   NEXT: PUT(rip) = 0x0000000000000006; Ijk_Boring
}

Looks right to me, but I haven't had any coffee yet, so who knows.

angr / vex

possible vex bug in handling segment offsets #41