Vector35 / binaryninja-api

Public API, examples, documentation and issues for Binary Ninja
https://binary.ninja/
MIT License
940 stars 213 forks source link

Add Outlining Support for strcat #5489

Open 0xdevalias opened 5 months ago

0xdevalias commented 5 months ago

What is the feature you'd like to have?

Currently, in the binary I am looking at, there are a number of string operations that are treated as raw hex numbers applied to variable offsets. It would be nice if these showed in a more 'as it would appear in the code' way, of just general string operations.

Is your feature request related to a problem?

There are other similar examples in the binary (macOS, x86_64) I'm looking at, but this is the most self contained one:

  005851a0  int64_t SerumGUI::getPath_AppDataBase(int64_t arg1, int64_t* arg2)
  005851b5      int64_t rax = *___stack_chk_guard
  005851d4      void var_70
  005851d4      if (_FSFindFolder(0xffff8005, 0x61737570, 0, &var_70) == 0)
  005851dc          int64_t rax_2 = _CFURLCreateFromFSRef(0, &var_70)
  005851f1          _CFURLGetFileSystemRepresentation(rax_2, 0, arg2, 0x200)
  005851f9          _CFRelease(rax_2)
  00585201          int64_t rax_3 = _strlen(arg2)
🐛00585210          *(arg2 + rax_3) = 0x6566782e6d6f632f
🐛0058521e          *(arg2 + rax_3 + 8) = 0x7364726f63657272
🐛0058522d          *(arg2 + rax_3 + 0x10) = 0x2f6d757265732e
  00585239      int64_t rax_4 = *___stack_chk_guard
  00585240      if (rax_4 != rax)
  0058524b          ___stack_chk_fail()
  0058524b          noreturn
  0058524a      return rax_4

If I right click -> Display As -> Character Constant on those hex constants, it looks a bit more readable:

    005851a0  int64_t SerumGUI::getPath_AppDataBase(int64_t arg1, int64_t* arg2)
    005851b5      int64_t rax = *___stack_chk_guard
    005851d4      void var_70
    005851d4      if (_FSFindFolder(0xffff8005, 0x61737570, 0, &var_70) == 0)
    005851dc          int64_t rax_2 = _CFURLCreateFromFSRef(0, &var_70)
    005851f1          _CFURLGetFileSystemRepresentation(rax_2, 0, arg2, 0x200)
    005851f9          _CFRelease(rax_2)
    00585201          int64_t rax_3 = _strlen(arg2)
- 🐛00585210          *(arg2 + rax_3) = 0x6566782e6d6f632f
- 🐛0058521e          *(arg2 + rax_3 + 8) = 0x7364726f63657272
- 🐛0058522d          *(arg2 + rax_3 + 0x10) = 0x2f6d757265732e
+ 🐛00585210          *(arg2 + rax_3) = '/com.xfe'
+ 🐛0058521e          *(arg2 + rax_3 + 8) = 'rrecords'
+ 🐛0058522d          *(arg2 + rax_3 + 0x10) = '.serum/'
    00585239      int64_t rax_4 = *___stack_chk_guard
    00585240      if (rax_4 != rax)
    0058524b          ___stack_chk_fail()
    0058524b          noreturn
    0058524a      return rax_4

But it would be nicer if it was automatically something more like this (or similar):

    005851a0  int64_t SerumGUI::getPath_AppDataBase(int64_t arg1, int64_t* arg2)
    005851b5      int64_t rax = *___stack_chk_guard
    005851d4      void var_70
    005851d4      if (_FSFindFolder(0xffff8005, 0x61737570, 0, &var_70) == 0)
    005851dc          int64_t rax_2 = _CFURLCreateFromFSRef(0, &var_70)
    005851f1          _CFURLGetFileSystemRepresentation(rax_2, 0, arg2, 0x200)
    005851f9          _CFRelease(rax_2)
    00585201          int64_t rax_3 = _strlen(arg2)
- 🐛00585210          *(arg2 + rax_3) = 0x6566782e6d6f632f
- 🐛0058521e          *(arg2 + rax_3 + 8) = 0x7364726f63657272
- 🐛0058522d          *(arg2 + rax_3 + 0x10) = 0x2f6d757265732e
+ 🐛00585210          *(arg2 + rax_3) = '/com.xferrecords.serum/'
    00585239      int64_t rax_4 = *___stack_chk_guard
    00585240      if (rax_4 != rax)
    0058524b          ___stack_chk_fail()
    0058524b          noreturn
    0058524a      return rax_4

Or even better would be:

    005851a0  int64_t SerumGUI::getPath_AppDataBase(int64_t arg1, int64_t* arg2)
    005851b5      int64_t rax = *___stack_chk_guard
    005851d4      void var_70
    005851d4      if (_FSFindFolder(0xffff8005, 0x61737570, 0, &var_70) == 0)
    005851dc          int64_t rax_2 = _CFURLCreateFromFSRef(0, &var_70)
    005851f1          _CFURLGetFileSystemRepresentation(rax_2, 0, arg2, 0x200)
    005851f9          _CFRelease(rax_2)
-   00585201          int64_t rax_3 = _strlen(arg2)
- 🐛00585210          *(arg2 + rax_3) = 0x6566782e6d6f632f
- 🐛0058521e          *(arg2 + rax_3 + 8) = 0x7364726f63657272
- 🐛0058522d          *(arg2 + rax_3 + 0x10) = 0x2f6d757265732e
+ 🐛00585210          _strcat(arg2, "/com.xferrecords.serum/")
    00585239      int64_t rax_4 = *___stack_chk_guard
    00585240      if (rax_4 != rax)
    0058524b          ___stack_chk_fail()
    0058524b          noreturn
    0058524a      return rax_4

Are any alternative solutions acceptable?

Unsure.

Additional Information:

image

image

image

image

image

xusheng6 commented 5 months ago

I think the issue here is our stack string detection does not (yet) work on offsets into variables. This is a nice feature request, thx!

xusheng6 commented 5 months ago

@0xdevalias could you please provide the binary so that it would be easier for us to work on this issue?

0xdevalias commented 5 months ago

could you please provide the binary so that it would be easier for us to work on this issue

@xusheng6 I'll see if the same thing is present in the demo version, and if so, can share that (the exact specifics may differ slightly from above screenshots, but should otherwise be fine)

If so, will share via slack, and then update here.

0xdevalias commented 5 months ago

I'll see if the same thing is present in the demo version

@xusheng6 Looks like it is:

image

Uploaded on slack here:

bpotchik commented 5 months ago

This is a good test case for strcat. We have a general issue as well (#3349) but will leave this one open for this specific function.

0xdevalias commented 5 months ago

Linking to this tangentially related issue as well for visibility: