Open Muqi-Zou opened 1 week ago
Fix suggestion
From the first look, the root cause of this issue is that zextshiftzext
causes the whole function to lose the size information of some varnodes. However, I also note that the condition to apply zextshiftzext
is kind of strict (e.g., // Only propagate if -op- is only use of -invn-
), which makes zextshiftzext
be applied in the second round.
Hence, IMHO, the root cause of this issue is applying concatzero
, instead of collapseconstants
, on concat ((const,0x0,4),(const,0x0,4))
. Specifically, when V
is constant, both concatzero
and collapseconstants
can be applied on concat(0,0)
, and collapseconstants
has higher priority than propagatecopy
in universalAction
. Hence, in our case, i.e., concat ((const,0x0,4),(const,0x0,4))
is generated after a propagatecopy
, concatzero
is applied first. If collapseconstants
is applied before concatzero
, the result would be correct with the following process:
As you can see, if we allow collapseconstants
to be applied first, the optimization for div ecx
can be finished within a round instead of three rounds.
Hence, IMHO, there are two ways to fix this issue:
RuleConcatZero
not to allow this rule to be applied, when V
is a constant. However, this fix is specifically designed for this situation.propagatecopy
) before certain rules and applying as many information-no-lose rules (e.g., collapseconstants
) as possible in the first round. I notice that Ghidra has already put information-may-lose rules (e.g., zextshiftzext
) with lower priority in universalAction
. However, this may not be enough, because the mechanism of Ghidra on applying rules (i.e., big loop for each ops, small loop for rules on one op) keeps generating new instructions, e.g., concat(0,0)
, after different rules e.g., propagatecopy
. Hence, there is no guarantee that those information-may-lose rules will be applied later than those high-priority rules. Hence, I suggest, in this case, adding another propagatecopy
before collapseconstants
. Recall in compiler, it is common to add dce
pass during optimizations, e.g., in buildLTODefaultPipeline of llvm, GlobalDCEPass
is called four times. It may not be a bad idea to call rules such as propagatecopy
or earlyremoval
multiple times in the middle.
Describe the bug I discovered this bug when comparing two decompiled codes from two different binary, one with
-O2
optimized and another with-O2 -fno-peephole2
optimized: As noticed, the sizes of division operations are different, one is 8 bytes another is 16 bytes. This difference causes unnecessary zero extension via the structure field. Meanwhile, the difference between the two binaries at the assembly level is only one line:mov edx, 0x0
,xor edx edx
:Root cause and analysis The root cause of this bug, in my opinion, is the rule optimization process halts in the middle stage with the current rule optimization order. Specifically, for
-O2 -fno-peephole2
binary, though aCopy edx, 0x0
instruction is successfully generated afterRuleTrivialArith
, thecancelExtensions
examination inRuleSubCommute
always fails with the current rule optimization order and the optimization process fordiv rcx
halts therefore.I debug this issue by examining all rules related to this div operation (i.e, check the status of all p-code instructions before and after the rule is applied, source code in debug_libevent_bufferevent_ratelim_action.cc in the attachment) and found that the following decompilation process causes the bug:
The whole picture looks a bit messy and let me explain what happens: Round 1,2,3 corresponds to the outside loop in
ActionPool::apply
, one round illustrates Ghidra has applied all rules to all instructions in the function from the start to the end. The number at the start, e.g., 100821, is the address of this p-code instructions. And the black code block contains selected p-code instructions affected by the applied rule in red.As shown in the first two blocks, the assignment of the node in the register space with offset 0x00000010, with 8 bytes changes from two constant concatenations to two instructions, an int left shift, and a zero extend via
concatzero
. After that, in the second round, theearlyremoval
, applied on pcode instruction with address 100821, remove the descent of theregister,0x00000008,8
. Since the descent ofregister,0x00000008,8
in 100827 is one,zextshiftzext
is applied to replaceregister,0x00000008,8
with 'unique,0x0000c200,4'. This causes, at the final block, Ghidra to fail to apply theRuleSubCommute
, since the sizes of the nodes in two zero extensions are not the same.