Open travitch opened 2 years ago
I've been working on a fix for (1)—that is, de-duplicating all of the different combinations of prefixes from the opcode table—for a while. I've pushed my current WIP branch here in case you'd like to follow along. That branch is a bit messy at the moment, since I'm adding a separate disassembler, named defaultX64Disassembler'
, that de-duplicates prefixes alongside the existing defaultX64Disassembler
. I've also copied a fair bit of related functionality, giving them apostrophe suffixes as well to distinguish them from the existing functionality. Once I'm confident that the new disassembler is bug-free, I'll consolidate all of these functions.
First, here is the general approach that I'm taking in this branch:
The structure of the opcode table has changed from this:
data OpcodeTable
= OpcodeTable !NextOpcodeTable
| SkipModRM !Prefixes !Def
| ReadModRMTable !(V.Vector ModTable)
| ReadModRMUnchecked !ModTable
To this:
data OpcodeTable'
= OpcodeTable' !NextOpcodeTable'
| OpcodeTableEntry ![Def] -- Defs expecting a ModR/M byte
![Def] -- Defs not expecting a ModR/M byte
Previously, the OpcodeTable
data constructor was used any time there was an instruction opcode or a prefix byte. In the new design, the OpcodeTable'
data constructor is only used for instruction opcodes—prefixes are not encoded into the structure of the table at all.
One consequence of this choice is that there can sometimes be multiple instruction definitions that use a particular set of opcodes. For instance, both the add
and the vpshufb
instructions use 00
as their opcode. For this reason, the OpcodeTableEntry
data constructor, which represents a leaf entry in the opcode table, must contain a list of Def
s rather than just a single one. In practice, these lists are quite small, and we may want to consider using a SmallArray
to represent them.
Once we have an OpcodeTableEntry
, we then use the list of parsed prefixes to disambiguate among all of the potential Def
s in the list. (Actually, there are two lists, since Def
s with a ModR/M byte have to be disassembled in a slightly different way. In principle, we could combine these into a single list, however.) Implementing the function that performs this disambiguation is one of the main challenges of this patch—more on this in a bit.
Instead of encoding all possible combinations of prefixes in the branching structure of the opcode table, we instead perform a parsing pass upfront that parses as many prefixes as possible before parsing instruction opcodes. Here is what this parsing pass looks like at the moment:
loopPrefixBytes :: Seq.Seq Word8 -> m InstructionInstance
loopPrefixBytes prefixBytes = do
b <- readByte
if | b `elem` simplePrefixBytes
-> loopPrefixBytes (prefixBytes Seq.|> b)
| b `elem` segPrefixBytes
-> loopPrefixBytes (prefixBytes Seq.|> b)
| b `elem` rexPrefixBytes
-> loopPrefixBytes (prefixBytes Seq.|> b)
-- Two-byte VEX prefix
| b == 0xc5
-> do b2 <- readByte
loopPrefixBytes (prefixBytes Seq.>< Seq.fromList [b, b2])
-- Three-byte VEX prefix
| b == 0xc4
-> do b2 <- readByte
b3 <- readByte
loopPrefixBytes (prefixBytes Seq.>< Seq.fromList [b, b2, b3])
| otherwise
-> loopOpcodes prefixBytes tr0 b -- Disassemble instruction based on opcodes
If this looks too simplistic to work, that's because that is. More on this later.
Once we have all of the prefixes and instruction opcodes, we have to use the prefixes to disambiguate among all of the possible Def
s. My branch contains a validatePrefixBytes
function that performs this disambiguation. The implementation is a bit too long to go into here, but among other things, it checks that:
validPrefix
funtion)There are an enormous number of possible checks that we could put into this function—see this page for some ideas. It might be best to only implement the checks we need to disambiguate instructions and add more later if they become necessary.
Remove some ugly hacks in the XML file representing all x86_64 instructions (see data/optable.xml
). For instance, there are quite a few nop
definitions that have dummy 66
prefixes to accommodate the way opcode table parsing currently works, accompanied by this comment:
Since we now parse prefixes upfront, I believe this hack should no longer be necessary, so we can remove these silly 66
-prefixed definitions.
Another similar 66
prefix hack can be found in one of the definitions of xchg
:
Similarly, I think we can just remove the 66
here now that we handle prefixes differently.
This approach was enough to get nearly the entire flexdis86
test suite to pass, save for one exception:
pause: FAIL
Exception: TODO RGS: No parse
CallStack (from HasCallStack):
error, called at src/Flexdis86/Disassembler.hs:1115:10 in flexdis86-0.1.5-inplace:Flexdis86.Disassembler
Use -p '/pause/' to rerun this test only.
Let's talk about why this happens. The prefix parsing approach that I took in step (2) makes a key assumption: bytes that can be used as prefixes will never be used as the first byte in an instruction's opcodes. After all, the only way we know how to stop disassembling prefixes is to encounter a byte that isn't in the set of known prefix bytes, which we interpret as an instruction's first opcode byte. Unfortunately, this assumption turns out not to be true. In the case of the pause
instruction, we have:
But the 0xf3
byte is also used for the repz
prefix:
As a result, we mistakenly parse pause
's first instruction opcode as a prefix, which prevents us from finding pause
in the opcode table later. One idea for working around this is to check if an instruction's opcodes start with bytes that could be interpreted as prefixes, and if so, "backtrack" through the list of parsed prefixes to remove the prefixes that were mistakenly classified as prefixes. This is essentially how the Haskell disassembler
library handles parsing pause
.
VEX prefixes (which the disassembler
library does not handle, AFAICT) complicate matters further. The lds
and les
instructions have opcodes which clash with 0xc5
and 0xc4
, the two-byte and three-byte VEX prefixes, respectively:
https://github.com/GaloisInc/flexdis86/blob/7109bdc9990a3e756eb7fb07419737d15ad41da0/data/optable.xml#L4256-L4263 https://github.com/GaloisInc/flexdis86/blob/7109bdc9990a3e756eb7fb07419737d15ad41da0/data/optable.xml#L4274-L4281
What's more, these prefixes are expected to be followed by some number of additional bytes, so not only would need to "backtrack" over parsing the VEX prefix, we would also need to backtrack over parsing the additional bytes that follow the prefix. Moreover, since lds
and les
' opcodes are each one byte, the additional bytes that we mistakenly parse upfront would correspond to the operands to the instruction! Hoo boy. It's a bit scary that the flexdis86
test suite doesn't catch this.
In short, I think we need some kind of way to perform this backtracking. I have some ideas for how to do this, but before I set out on implementing this idea, I wanted to do a quick sanity check with @travitch to make sure I'm on the right path. Can you see a simpler way to solve these problems?
I wonder if, instead of "backtracking", we could just remember the last byte in the prefixes we parsed, and incorporate those "last bytes" into the validity/opcode checks.
I also think we might need to be a bit liberal in accepting prefixes that are invalid if the parse is otherwise unambiguous (e.g., see https://repzret.org/p/repzret/)
I wonder if, instead of "backtracking", we could just remember the last byte in the prefixes we parsed, and incorporate those "last bytes" into the validity/opcode checks.
I considered this, but one complication with this idea is that the opcode table encodes the invariant that every instruction is reachable by a path containing at least one OpcodeTable
data constructor. If this invariant is violated, you will reach one of these error cases:
(In my branch, there is a corresponding error case for OpcodeTableEntry
.)
For instructions like lds
and les
, the opcode consists of only a single byte. If we parse this byte as a prefix, then there are no bytes remaining to use for the OpcodeTable
path, so we would not be able to store these instructions in the table. The backtracking idea is the only thing I can think of for making this work. That is, encode lds
and les
' opcodes into the the table as normal, parse its opcode as prefix bytes, and then when we try to check that we have the lds
or les
instruction, backtrack over the prefix bytes so that we can look them up in the opcode table as intended.
I also think we might need to be a bit liberal in accepting prefixes that are invalid if the parse is otherwise unambiguous (e.g., see https://repzret.org/p/repzret/)
Huh, that's an interesting read. I don't think this particular example poses an issue for my branch, as it can successfully disassemble f3 c3
using the set of validity checks that I have currently. But this is a worthwhile cautionary tale to not go overboard with adding too many additional validity checks, lest we reject examples like this one.
Regarding repz ret
, I bet the data file specifies repz
as an allowed prefix when it really isn't (to actually parse this correctly). I could be wrong - that is just a guess.
Regarding
repz ret
, I bet the data file specifiesrepz
as an allowed prefix when it really isn't (to actually parse this correctly).
This seems likely, as this prefix was added for... reasons in b625205347ea5e9a459565c2c3e3f7c38b3ae60d. Nevertheless, I'm not too bothered by this, as this is a hack that would be needed in both the current and new designs. (If it were a hack that was only needed in one particular design, that would be a bit more eyebrow-raising.)
This might be effectively backtracking, but there could be a PseudoOpcodeTable
constructor that acts mostly like OpcodeTable
, except it means "Dispatch on the last byte decoded from a prefix". I'm browsing through to see how to implement that
This might be effectively backtracking, but there could be a
PseudoOpcodeTable
constructor that acts mostly likeOpcodeTable
, except it means "Dispatch on the last byte decoded from a prefix".
That might work, although things would get complicated for instructions whose operands are parsed eagerly as prefixes due to VEX. I was thinking of instead having a newtype like this:
newtype BacktrackingByteReader m a = BacktrackingByteReader (StateT [Word8] m a)
And giving it a ByteReader
instance such that when readByte
is called, it will read it from the [Word8]
if it is non-empty (and pop off the byte afterward), and disassemble from bytes otherwise. This would avoid needing to change any of the operand-dissembling code to be aware of backtracking, as we could continue to program against the polymorphic ByteReader
interface.
Just to record it, we talked about having a separate set of parse tables for VEX instructions that would be consulted iff the VEX prefix is in the set of parsed prefixes. That would avoid a need for backtracking in those cases.
It would be nice to know if there are cases besides pause
that exist in the rest of the instruction space. pause
is truly unfortunate, because it really isn't an instruction - it is just repz nop
, which is treated as a special hint named pause
. If there are a small set of such instructions (where they are just aliases), we may want to consider instead separating those out from the data table and just using them to guide pretty printing (rather than decoding).
Operationally, that would mean parsing f3 90
as a NOP, but noticing that it can be rendered as pause
and fixing it up after the fact.
Just to record it, we talked about having a separate set of parse tables for VEX instructions that would be consulted iff the VEX prefix is in the set of parsed prefixes. That would avoid a need for backtracking in those cases.
As an experiment, I pushed a branch here that encodes VEX prefix bytes into the opcode table alongside the instruction opcodes to avoid any conflicts with instructions like lds
and les
. Here is how large the opcode table is on that branch:
λ> nextOpcodeSize' defaultX64Disassembler'
13143
For comparison, here is how large it is on the main
branch:
λ> nextOpcodeSize defaultX64Disassembler
5869432
And here is how large it is after removing VEX prefixes:
λ> nextOpcodeSize' defaultX64Disassembler'
1409
That being said, one thing about lds
and les
that I did not realize until recently is that they only work in 32-bit mode, and since defaultX64Disassembler
filters out instructions that don't work in 64-bit mode, they aren't even included in the table to begin with. As a result, I'm forced to partially retract my "hoo boy" comment from https://github.com/GaloisInc/flexdis86/issues/40#issuecomment-1161782312.
Speaking of which, are there other instructions whose opcodes conflict with bytes used for prefixes? Thankfully, les
and lds
are the only ones that conflict with VEX prefixes. What about the remaining 16 bytes used for prefixes? I audited data/optable.xml
, and here are the results:
No conflicts in any of 0x26
, 0x2e
, 0x36
, 0x3e
, 0x64
, or 0x65
, thankfully enough.
0x66
: As noted in https://github.com/GaloisInc/flexdis86/issues/40#issuecomment-1161782312, there were several hacky occurrences of nop
and xchg
that had 0x66
placed in front of their opcodes to work around limitations of the current approach to disassembling, but these hacks can be removed with the new approach.
Another conflict is vpcmpgtd
:
<instruction>
<mnemonic>vpcmpgtd</mnemonic>
<class>avx</class>
<def>
<opc>/vex=NDS.128.66.0F.WIG 66</opc>
<opr>Vx Hx Ux</opr>
</def>
</instruction>
0x67
: No conflicts
0xf0
: No conflicts
0xf2
: The vpslld
instruction has some conflicts:
<instruction>
<mnemonic>vpslld</mnemonic>
<class>avx</class>
<def>
<opc>/vex=NDS.128.66.0F.WIG F2</opc>
<opr>Vx Hx Wx</opr>
</def>
...
<def>
<opc>/vex=NDS.256.66.0F.WIG F2</opc>
<opr>Vx Hx Wx</opr>
</def>
...
</instruction>
0xf3
: endbr32
and endbr64
conflict:
<instruction>
<mnemonic>endbr32</mnemonic>
<def>
<opc>F3 0F 1E FB</opc>
</def>
</instruction>
<instruction>
<mnemonic>endbr64</mnemonic>
<def>
<opc>F3 0F 1E FA</opc>
</def>
</instruction>
As does vpsllq
:
<instruction>
<mnemonic>vpsllq</mnemonic>
<class>avx</class>
<def>
<opc>/vex=NDS.128.66.0F.WIG F3</opc>
<opr>Vx Ux Wx</opr>
</def>
...
</instruction>
And our old friend pause
:
<instruction>
<mnemonic>pause</mnemonic>
<def>
<opc>f3 90</opc>
</def>
</instruction>
The inc
instruction can have opcodes 0x40
through 0x47
and the dec
instruction can have opcodes 0x48
through 0x4f
, all of which are REX prefixes. Each instruction has different opcodes in 64-bit mode, however, so this issue shouldn't show up in the flexdis86
test suite for the reasons described above.
The vpclmulqdq
also conflicts with the 0x44
REX prefix:
<instruction>
<mnemonic>vpclmulqdq</mnemonic>
<class>avx</class>
<def>
<opc>/vex=NDS.128.66.0F3A.WIG 44</opc>
<opr>Vdq Hdq Wdq Ib</opr>
</def>
</instruction>
Several of these issues would be avoided by encoding VEX prefixes into the opcode table. Moreover, the inc
and dec
instructions wouldn't pose issues in 64-bit mode. By my count, that only leaves endbr32
, endbr64
, and pause
as potential sources of conflict.
endbr32
and endbr64
are also special no-ops
At long last, #43 takes care of part (1). Part (2) may also be worth doing, but I imagine (1) alone will be enough to knock out most of the egregious memory usage, especially in light of https://github.com/GaloisInc/flexdis86/pull/43#issuecomment-1172601965.
The parse tables occupy about 400MB in memory after they are constructed, as can be seen in this profile collected by @RyanGlScott: verify-RSA.saw.pdf. There are two factors to this memory consumption:
Addressing the former is tricky. One could use a simple DFA to parse prefix bytes separately to save an enormous amount of space. However, not all prefixes are valid for all instructions; those restrictions are currently properly encoded in the fully elaborated tables. To separate out prefix parsing, it would be necessary to add a post-parsing check to see if the parse was valid or not.
Addressing the latter might be less tricky, as we could change the representation of the tables. Another disassembler uses a mostly unboxed structure: https://github.com/travitch/dismantle/blob/48433e7ccb02924b2f4695c8c9f09fb9cfccdfc4/dismantle-tablegen/src/Dismantle/Tablegen/LinearizedTrie.hs#L34. The x86 case is a bit trickier as the parser has more states than the parsers generated by dismantle. However, we might be able to take inspiration from the more compact parse table representation and adapt it for flexdis.