Sequence of rule applied affects the size ouput

NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework

Apache License 2.0

49.06k stars 5.65k forks source link

Describe the bug I discovered this bug when comparing two decompiled codes from two different binary, one with -O2 optimized and another with -O2 -fno-peephole2 optimized: Screenshot 2024-06-17 162429 As noticed, the sizes of division operations are different, one is 8 bytes another is 16 bytes. This difference causes unnecessary zero extension via the structure field. Meanwhile, the difference between the two binaries at the assembly level is only one line: mov edx, 0x0 , xor edx edx: Screenshot 2024-06-17 164317

Root cause and analysis The root cause of this bug, in my opinion, is the rule optimization process halts in the middle stage with the current rule optimization order. Specifically, for -O2 -fno-peephole2 binary, though a Copy edx, 0x0 instruction is successfully generated after RuleTrivialArith, the cancelExtensions examination in RuleSubCommute always fails with the current rule optimization order and the optimization process for div rcx halts therefore.

I debug this issue by examining all rules related to this div operation (i.e, check the status of all p-code instructions before and after the rule is applied, source code in debug_libevent_bufferevent_ratelim_action.cc in the attachment) and found that the following decompilation process causes the bug: Screenshot 2024-06-17 192832

The whole picture looks a bit messy and let me explain what happens: Round 1,2,3 corresponds to the outside loop in ActionPool::apply, one round illustrates Ghidra has applied all rules to all instructions in the function from the start to the end. The number at the start, e.g., 100821, is the address of this p-code instructions. And the black code block contains selected p-code instructions affected by the applied rule in red.

As shown in the first two blocks, the assignment of the node in the register space with offset 0x00000010, with 8 bytes changes from two constant concatenations to two instructions, an int left shift, and a zero extend via concatzero. After that, in the second round, the earlyremoval, applied on pcode instruction with address 100821, remove the descent of the register,0x00000008,8. Since the descent of register,0x00000008,8 in 100827 is one, zextshiftzext is applied to replace register,0x00000008,8 with 'unique,0x0000c200,4'. This causes, at the final block, Ghidra to fail to apply the RuleSubCommute, since the sizes of the nodes in two zero extensions are not the same.

Fix suggestion From the first look, the root cause of this issue is that zextshiftzext causes the whole function to lose the size information of some varnodes. However, I also note that the condition to apply zextshiftzext is kind of strict (e.g., // Only propagate if -op- is only use of -invn-), which makes zextshiftzext be applied in the second round.

Hence, IMHO, the root cause of this issue is applying concatzero, instead of collapseconstants, on concat ((const,0x0,4),(const,0x0,4)). Specifically, when V is constant, both concatzero and collapseconstants can be applied on concat(0,0), and collapseconstants has higher priority than propagatecopy in universalAction. Hence, in our case, i.e., concat ((const,0x0,4),(const,0x0,4)) is generated after a propagatecopy, concatzero is applied first. If collapseconstants is applied before concatzero, the result would be correct with the following process: Screenshot 2024-06-18 001314 As you can see, if we allow collapseconstants to be applied first, the optimization for div ecx can be finished within a round instead of three rounds.

Hence, IMHO, there are two ways to fix this issue:

One of the easiest ways to fix this issue is to add a condition in RuleConcatZero not to allow this rule to be applied, when V is a constant. However, this fix is specifically designed for this situation.
The second fix proposal is more general, which is adding essential rules (e.g., propagatecopy) before certain rules and applying as many information-no-lose rules (e.g., collapseconstants) as possible in the first round. I notice that Ghidra has already put information-may-lose rules (e.g., zextshiftzext) with lower priority in universalAction. However, this may not be enough, because the mechanism of Ghidra on applying rules (i.e., big loop for each ops, small loop for rules on one op) keeps generating new instructions, e.g., concat(0,0), after different rules e.g., propagatecopy. Hence, there is no guarantee that those information-may-lose rules will be applied later than those high-priority rules. Hence, I suggest, in this case, adding another propagatecopy before collapseconstants. Recall in compiler, it is common to add dce pass during optimizations, e.g., in buildLTODefaultPipeline of llvm, GlobalDCEPass is called four times. It may not be a bad idea to call rules such as propagatecopy or earlyremoval multiple times in the middle.

NationalSecurityAgency / ghidra

Sequence of rule applied affects the size ouput #6648