Incorrect decoding of UD2

dvyukov commented 1 year ago

This byte sequence is a single UD1 instruction:

echo -en "\x67\x0f\xb9\x40\x16" > /tmp/bin | objdump -mi386 -Mx86-64 -b binary -D /tmp/bin
   0:   67 0f b9 40 16          ud1    0x16(%eax),%eax

echo -en "0x67 0x0f 0xb9 0x40 0x16" | llvm-mc --disassemble -
    ud1l    22(%eax), %eax

but drdecode decodes it as 3-byte instruction. This breaks decoding of instruction stream. I see lots of these instructions in real binaries compiled with clang, it seems to use it as padding at the end of functions or something.

I am using commit 45c9973e363254d534b819d3cc508601f534bb71.

derekbruening commented 1 year ago

@khuey would it be possible for you to take this on?

khuey commented 1 year ago

Looks like we just don't handle ud1 (or ud0 for that matter). How do we feel about renumbering the opcodes vs adding missing stuff at the end?

khuey commented 1 year ago

Ah, no, actually we do decode ud1 (just as OP_ud2b), there's just no operands listed in the decode table.

There's still the question of what we want to do about ud0, but perhaps we just punt on that for now.

derekbruening commented 1 year ago

Looks like we just don't handle ud1 (or ud0 for that matter). How do we feel about renumbering the opcodes vs adding missing stuff at the end?

We add at the end.

dvyukov commented 1 year ago

Thanks!

DynamoRIO / dynamorio

Incorrect decoding of UD2 #5979