Vector35 / binaryninja-api

Public API, examples, documentation and issues for Binary Ninja
https://binary.ninja/
MIT License
946 stars 214 forks source link

inconsistencies in permission heirarchy of segments / sections / variables #5834

Open bb33bb opened 3 months ago

bb33bb commented 3 months ago

Version and Platform (required):

Bug Description: [after ninja load symbol when we need to F5, the c cocde ](https://github.com/Vector35/binaryninja-api/issues/5819)

The key problem: When I am reversing vmlinux, after loading vmlinux, Ninja will load symbols and get the value of symbols, then use the values in functions and output the static analysis answer based on the symbol data. But to be honest, a lot of symbol values will change after loading into the kernel. Here is a real case:


static int diskstats_show(struct seq_file *seqf, void *v)
{
    uint64_t v1= __virt_to_phys(v);
    printk("v1 : %lx\n",v1);
    uint64_t p1= __phys_to_virt(v1);
    printk("p1 : %lx\n",p1);

after compiled , Using ninja to open vmlinux. we can see the code as following:

ffffff800850d324  int diskstats_show(struct seq_file* seqf, void* v)

ffffff800850d334      void* v_1
ffffff800850d334      
ffffff800850d334      if ((v & 0x40) != 0)
ffffff800850d350          v_1 = -1 + (v & 0x3fffffffff)
ffffff800850d334      else
ffffff800850d340          v_1 = v
ffffff800850d340      
ffffff800850d360      printk(fmt: "v1 : %lx\n", v_1)
ffffff800850d378      printk(fmt: "p1 : %lx\n", (v_1 + 1) | 0xffffffc000000000)

But in ghidra we can see :

undefined8 diskstats_show(undefined8 param_1,ulong param_2,ulong *param_3,ulong *param_4,ulong *param_5,ulong *param_6,undefined1 *param_7,ulong param_8)

{
  long lVar1;

  if ((param_2 >> 0x26 & 1) == 0) {
    lVar1 = param_2 - kimage_voffset;
  }
  else {
    lVar1 = memstart_addr + (param_2 & 0x3fffffffff);
  }
  printk((ulong *)s_v1_:_%lx_ffffff8009edbeba,lVar1,param_3,param_4,param_5,param_6,param_7,param_8);
  printk((ulong *)s_p1_:_%lx_ffffff8009edbec4,lVar1 - memstart_addr | 0xffffffc000000000,param_3,param_4,param_5,param_6,param_7,param_8);
  return 0;
}

and we can check ,the ans is really different.

bb33bb commented 3 months ago

"-1" in this line v_1 = -1 + (v & 0x3fffffffff) -1 is retived from symbol i think. But in vmlinux , i think in other cases , may be it is the same. In vmlinux, the data section have many variables with initialized values. These initialized values can be 0 or -1 or 0xffffffffffffffff If Ninja loads these values and use these values to deep analizes all functions , it is really a little aggressive. Because after the initialized values loaded into the function, the flow of function will change to a specific case which can not stand for or can not show with the full functions infomation to ninja users.

bb33bb commented 3 months ago

pure talking may be is really not ok to show it. here are 2 files in .ko file because the symbol is extern, and Ninja can not load the value of symbol i think. then the c code is OK as following:

00007294  int64_t run_command(int64_t arg1)

00007294  {
000072e0      int64_t x8_6 = ((arg1 - *(uint64_t*)(memstart_addr & 0xffff)) | 0xffffffc000000000);
000072f4      printk(0x8356, arg1, x8_6);
0000730c      int64_t var_58;
0000730c      
0000730c      if ((x8_6 & 0x40) == 0)
0000735c          var_58 = (x8_6 - *(uint64_t*)(kimage_voffset & 0xffff));
0000730c      else
00007338          var_58 = ((x8_6 & 0x3fffffffff) + *(uint64_t*)(memstart_addr & 0xffff));
00007338      
0000738c      printk(0x8356, var_58, x8_6);
000073a0      return 0;
00007294  }

but if we are reversing vmlinux, the symbols according with data can be retrived , then the functions have a different c code as following:

ffffff800850d324  int diskstats_show(struct seq_file* seqf, void* v)

ffffff800850d324  {
ffffff800850d334      void* v_1;
ffffff800850d334      
ffffff800850d334      if ((v & 0x40) != 0)
ffffff800850d350          v_1 = (-1 + (v & 0x3fffffffff));
ffffff800850d334      else
ffffff800850d340          v_1 = v;
ffffff800850d340      
ffffff800850d360      printk("v1 : %lx\n", v_1);
ffffff800850d378      printk("p1 : %lx\n", (((char*)v_1 + 1) | 0xffffffc000000000));
ffffff800850d388      return 0;
ffffff800850d324  }

I did not revere it so thats just my guess Sorry to make a wrong guess if I am wrong

bb33bb commented 3 months ago

krwx.ko.zip https://drive.google.com/file/d/1XKREq3HFLp3cs7IrGemjLgiaM8VYGS03/view?usp=sharing

xusheng6 commented 3 months ago

The reason why we used the value of the memstart_addr variable is because it sits in a section with read-only semantics:

Screenshot 2024-08-13 at 12 08 00 PM

Screenshot 2024-08-13 at 12 07 06 PM

Surprisingly, we can also see several such read-only sections are actually in a segment marked as RWX

I am not sure how this should be handled, I will bring it up for some discussions

xusheng6 commented 3 months ago

A workaround is to right-click on the .rodata section, edit it, and set it to Writable Data, then the analysis will be fixed:

Screenshot 2024-08-13 at 10 20 05 PM

bb33bb commented 3 months ago

A workaround is to right-click on the .rodata section, edit it, and set it to Writable Data, then the analysis will be fixed:

Screenshot 2024-08-13 at 10 20 05 PM

谢谢! 这个问题确实是有时候有点点小小的迷惑。也许您提到的这个,设置为可写确实是我需要的。 这个Image是三星官方源码编译的,由于三星手机内核的一点点特殊,也许有可能是它尽量标记为只读,或者初始化后尽量标记为只读。或者也可能是vmlinux就这样。 我不知道我们是否有必要将这个静态分析,优化的选项前置,因为确实对准确性略有影响,或者还是我们考虑对齐其他同行产品? 目前我开始采用您提得办法设置读写来分析。

Thank you!
This issue is indeed a bit confusing sometimes. Perhaps what you mentioned about setting it to writable is indeed what I need.
This Image is compiled from Samsung's official source code. Due to some slight peculiarities in Samsung phone kernels, it's possible that they try to mark it as read-only, or mark it as read-only as much as possible after initialization. Or it could be that the vmlinux is just like this.
I'm not sure if we need to prioritize this static analysis and optimization option, as it does slightly affect accuracy, or if we should consider aligning with other peer products?
Currently, I've started using the method you suggested of setting it to read-write for analysis.
bb33bb commented 3 months ago

也许我们还是可以考虑下处理这个问题。如下所示: 在静态文件中,默认编译出来的初始化的数值,和真机中的确实还是不一样。 为了安全,三星手机内核在启动完,data段初始化结束后,会将一部分data段设置为只读。 于是出现了vmlinux中标记为只读,实际上启动后会是修改后的数值。 大概我的理解是这样。如果不正确,还望见谅。 如下所示是一个demo的截图 我在通过您提到的方法 https://github.com/Vector35/binaryninja-api/issues/5834#issuecomment-2286381557 修改属性后,可以显示出 memstart_addr符号

Perhaps we can still consider addressing this issue. 
As shown below: In static files, the default compiled initialization values are indeed different from those 
in real devices. 
For safety, after booting and initializing the data segment, Samsung phone kernels set part of the data 
segment as read-only. 
This results in a situation where values marked as read-only in vmlinux are actually modified after startup. 
This is roughly my understanding.
If it's incorrect, please forgive me. 
Below is a screenshot of a demo showing that after modifying the attributes using the method you 
mentioned in #5834 (comment), 
I can display the memstart_addr symbol.

image 查看符号处的数据和真机中的数据如下所示 image memstart_addr物理含义是,text段前面的固定的物理地址的起始地址。

bb33bb commented 3 months ago

open python console then paste the following python scripts:

for section in bv.sections.values():
    print(f"Section name: {section.name}")
    section.permissions = SectionSemantics.ReadWriteDataSectionSemantics
bb33bb commented 3 months ago

permissions

rodata_section.permissions = SectionSemantics.ReadWriteDataSectionSemantics it seems like this is no usefull

and i tried to use

rodata_section = bv.get_section_by_name('.rodata')
rodata_section.semantics = SectionSemantics.ReadWriteDataSectionSemantics
bv.update_analysis_and_wait()

the ret is

Traceback (most recent call last):
  File "<console>", line 1, in <module>
AttributeError: property 'semantics' of 'Section' object has no setter

is there any possible for us to use one line of python script finish this job?

psifertex commented 3 months ago

I've updated the title based on some internal discussion.

The new memory map system is designed to unify and resolve these issues and once the roadmap for that is complete we expect this issue and similar ones to be resolved.