gdabah / distorm

Powerful Disassembler Library For x86/AMD64
Other
1.26k stars 238 forks source link

disassembly error when decoding an LES (c4) or LDS (c5) instruction that fails #56

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago

Using lates Distorm 3.3.

When decoding an LES (c4) or LDS (c5) instruction and it fails,
the following byte or 2 are not then properly decoded either.

Disassembly error examples:

Using python interface:
 import distorm3 as di
 from binascii import unhexlify
 di.Decode(0, unhexlify("bytesequence"), di.Decode32Bits)

For all of these, the first byte results in the
rest of the byte sequence being decoded improperly.

==================
c5ff837b0c000f

should be
  c5           DB 0xc5
  ff837b0c000f INC DWORD [EBX+0xf000c7b]

disassembles as
  c5           DB 0xc5
  ff           DB 0xff
  83           DB 0x83
  7b0c         JNP 0x11
  000f         ADD [EDI], CL

==================
c5d0010100

should be
  c5           DB 0xc5
  d001         ROL BYTE [ECX], 0x1
  0100         ADD [EAX], EAX

disassembles as
  c5           DB 0xc5
  d0           DB 0xd0
  01           DB 0x1
  0100         ADD [EAX], EAX

==================
c4caffff8d5644

should be
  c4           DB 0xc4
  caffff       RETF 0xffff
  8d5644       LEA EDX, [ESI+0x44]

disassembles as
  c4           DB 0xc4
  ca           DB 0xca
  ff           DB 0xff
  ff           DB 0xff
  8d5644       LEA EDX, [ESI+0x44]

==================
c4ffff83c43c0b44

should be
  c4           DB 0xc4
  ff           DB 0xff
  ff83c43c0b44 INC DWORD [EBX+0x440b3cc4]

disassembles as
  c4           DB 0xc4
  ff           DB 0xff
  ff           DB 0xff
  83           DB 0x83
  c43c0b       LES EDI, [EBX+ECX]
  44           INC ESP

==================
c4effeff83c4048975

should be
  c4           DB 0xc4
  ef           OUT DX, EAX
  fe           DB 0xfe
  ff83c4048975 INC DWORD [EBX+0x758904c4]

disassembles as
  c4           DB 0xc4
  ef           DB 0xef
  fe           DB 0xfe
  ff           DB 0xff
  83c404       ADD ESP, 0x4
  89           DB 0x89
  75           DB 0x75

==================
c5f0fd

should be
  c5           DB 0xc5
  f0fd         STD

disassembles as
  c5           DB 0xc5
  f0           DB 0xf0
  fd           DB 0xfd

==================
c4fcffff8946048d44

should be
  c4           DB 0xc4
  fc           CLD
  ff           DB 0xff
  ff8946048d44 DEC DWORD [ECX+0x448d0446]

disassembles as
  c4           DB 0xc4
  fc           DB 0xfc
  ff           DB 0xff
  ff           DB 0xff
  894604       MOV [ESI+0x4], EAX
  8d           DB 0x8d
  44           INC ESP

==================
c4d0ffff83c40c508d

should be
  c4           DB 0xc4
  d0ff         SAR BH, 0x1
  ff83c40c508d INC DWORD [EBX-0x72aff33c]

disassembles as
  c4           DB 0xc4
  d0           DB 0xd0
  ff           DB 0xff
  ff           DB 0xff
  83c40c       ADD ESP, 0xc
  50           PUSH EAX
  8d           DB 0x8d

==================
c4cc08000000

should be
  c4           DB 0xc4
  cc           INT 3
  0800         OR [EAX], AL
  0000         ADD [EAX], AL

disassembles as
  c4           DB 0xc4
  cc           DB 0xcc      <-- should have been an INT3 
  08           DB 0x8       <-- should have been OR [EAX], AL
  00           DB 0x0
  0000         ADD [EAX], AL

==================
c4d417788b44

should be
  c4           DB 0xc4
  d417         AAM 0x17
  788b         JS 0xffffff8f
  44           INC ESP

disassembles as
  c4           DB 0xc4
  d4           DB 0xd4
  17           DB 0x17
  78           DB 0x78
  8b           DB 0x8b
  44           INC ESP

==================
c5c7010f54

should be
  c5           DB 0xc5
  c7           DB 0xc7
  010f         ADD [EDI], ECX
  54           PUSH ESP

disassembles as
  c5           DB 0xc5
  c7           DB 0xc7
  01           DB 0x1
  0f           DB 0xf
  54           PUSH ESP

==================
c5c303b940400000

should be
  c5           DB 0xc5
  c3           RET
  03b940400000 ADD EDI, [ECX+0x4040]

disassembles as
  c5           DB 0xc5
  c3           DB 0xc3          <-- should be RET
  03           DB 0x3           <-- should be ADD EDI
  b940400000   MOV ECX, 0x4040

Original issue reported on code.google.com by mnor...@cerodias.com on 30 Oct 2012 at 6:24

GoogleCodeExporter commented 9 years ago
Did you debug and single step it ?
Maybe the AVX prefix causes this.

Original comment by distorm@gmail.com on 30 Oct 2012 at 6:58

GoogleCodeExporter commented 9 years ago
I've done no debugging, other than finding the error and reporting multiple
examples. I suppose the avx prefix could be the issue as it uses the same
opcodes as the les and lds instructions. But shouldn't this only apply to
instructions using the SIMD XMM registers?
And, on 32bit, the prefix should only be valid when the following byte is
11xxxxxx.

Original comment by mnor...@cerodias.com on 4 Nov 2012 at 3:01

GoogleCodeExporter commented 9 years ago
I found the problem.. I will fix it when I get the time. Thanks for the info!

Original comment by distorm@gmail.com on 5 Nov 2012 at 10:48

GoogleCodeExporter commented 9 years ago
I am now figuring out how to deal with it. And first question that pops up to 
my mind is why you decided that this behavior is an error ?
As far as I'm concerned, the VEX prefix is skipped just as if it were any other 
prefix,  and since it might be 2 or 3 bytes - they are all skipped.

I could add some "hack" to skip only the first byte of it, and continue from 
there, but to be honest, I'm not sure which way is the right way.

Original comment by distorm@gmail.com on 19 Nov 2012 at 10:19

GoogleCodeExporter commented 9 years ago
Part of why I decided this was an error is that distorm is not handling the 
issue the same way as ndisasm or IDA. They both mark the first byte with 'db' 
and start a new instruction with the next byte. I believe this is how distorm 
handles other instructions that fail disassembly and expected the same behavior 
here.

Also, if you look closely at the examples above, sometimes this is affecting 
the 4th byte in the sequence. As an example, consider

c4d0ffff83c40c508d

should be
  c4           DB 0xc4
  d0ff         SAR BH, 0x1
  ff83c40c508d INC DWORD [EBX-0x72aff33c]

disassembles as
  c4           DB 0xc4
  d0           DB 0xd0
  ff           DB 0xff
  ff           DB 0xff
  83c40c       ADD ESP, 0xc
  50           PUSH EAX
  8d           DB 0x8d

The 4th byte is supposed to be the start of the INC instruction, but was 
somehow messed up as well.

Original comment by mnor...@cerodias.com on 24 Nov 2012 at 3:01

GoogleCodeExporter commented 9 years ago
Problem fixed. Will be released in next version.

Original comment by distorm@gmail.com on 26 Nov 2012 at 8:12

GoogleCodeExporter commented 9 years ago
Thanks!
 On Nov 19, 2012 5:19 PM, <distorm@googlecode.com> wrote:

Original comment by mnor...@cerodias.com on 26 Nov 2012 at 8:09