Vector35 / binaryninja-api

Public API, examples, documentation and issues for Binary Ninja
https://binary.ninja/
MIT License
930 stars 210 forks source link

Missing Phi Function in MLIL #4546

Open NeoQuix opened 1 year ago

NeoQuix commented 1 year ago

Version and Platform (required):

Bug Description: Take a look at the following CFG: Screenshot_20230804_111137

rdx_2#2 and rdx_2#3 are assigned in the upper blocks which lead into the block at address 93. In that block rdx_2#4 is used, but not defined. It should be defined by a phi function in the form of rdx_2#4 = phi(rdx_2#2, rdx_2#3) at the start of the block, like the one for dwSize#4

Steps To Reproduce:

  1. Open sacsess.tar.gz at CVTUTF8Scraper::Write (Addr: 0x140003750)
  2. Look at the MLIL representation of the blocks where rdx_2 is used.

Expected Behavior: There should be phi function defining rdx_2#4

psifertex commented 1 year ago

Ahh, good catch, thanks. I suspect our recent "smart" variable renaming has modified instruction 93 in the screenshot above which is breaking the SSA relationships. This is a regression on dev and is correct on stable. Thanks for catching it before the next stable release!

NeoQuix commented 1 year ago

Is it correct on stable? On v3.4.4271 i get the following graph: Screenshot_20230807_122615

Where indeed the variable renaming is different (rdx_3#4 vs dwSize#4) but still rdx_2#4 is not assigned. Or am i missing something?

fuzyll commented 1 year ago

I can confirm I am seeing the same output on 3.4.4271 stable with the input file from the original ticket.

psifertex commented 1 year ago

Yeah, that's my mistake -- sorry about that, I misread the output from stable. The renaming of the variable is new and changes the phi at the top of the function but is unrelated to this issue.

Screenshot 2023-08-07 at 09 30 20
rssor commented 1 year ago

The best near term fix for this will be to make that rdx_2#4 in instruction 97 be dwSize#4.X, which is still wrong but at least correct in terms of SSA form.

This is bumping up against fundamental limits in where we allow variables to be defined (e.g. can accept new names/types) -- right now there's no way to signal that rdx is a logically separate variable as of instruction 98.

After fixing the SSA translator bug impacting 97, we should probably just disable load/store splitting with destinations in registers until we have the ability to accept more granular information about variable lifetimes. This won't completely address the issue as it could still be surfaced via subregister accesses but it'll make it drastically less common.

(To clarify: in this case disabling load/store splitting would mean that rdx will have a full write at that address, so it would become a new variable with its own index rather than taking over the phi destination -- the underlying limit still exists, and since subregisters would still result in partial the underlying issue of not being able to make new variables along sequences of partial writes would remain present.)