binsync / binsync

A reversing plugin for cross-decompiler collaboration, built on git.
https://binsync.net
BSD 2-Clause "Simplified" License
549 stars 39 forks source link

Fix IDA Segfaulting Bug in Magic Sync #197

Closed mahaloz closed 1 year ago

mahaloz commented 2 years ago

Background

During defcon finals 22 the binary ni_visor was used for Magic Sync from various users. During this, someone introduced something into their local branch that caused a segfault. This started mutating the more people Magic Sync, which we came to call the @zardus Virus.

After a few minutes when anyone hit Magic Sync, that IDA would segfault while rapidly making changes. Attached is the binary. The BinSync repo can be found at: https://github.com/Shellphish-DC22/f22-nivisor

If you connect then hit Magic Sync with @zardus as the priority it will segfault IDA. I've also attached the log right before this crash for reference.

20220822-151018.bs.log nivisor.zip

mahaloz commented 1 year ago

Update

So I tried to debug this for a bit tonight and I've run out of time. Attached is an updated log with important information. The important line is the last line before crash:

Filling function header to <FuncHeader: long main(args=3); @0x15316> now...

Essentially it crashes while filling the header of main with sane-looking types and args. Culprit function is here: https://github.com/binsync/binsync/blob/d50084e387412010066cf698b8f3debf008cb712/plugins/ida_binsync/ida_binsync/compat.py#L233

The thing that triggers the crash at this point actually has nothing to do with setting any values. Instead, the crash is triggered by calling refresh_pseudocode_view with the same address twice. It does not matter if you make changes before each of these calls.

This makes me thing that we are grabbing a reference to vu in a bad way. I'm really not sure, though. It's important to note that when I disabled this whole function, I still got crashes when simply decompiling other functions in this magic merge.

If looking to debug in the future, set breakpoints for the culprit function like so:

if addr == 0x3fc2:
   import remote_pdb; remote_pdb.RemotePdb("localhost", 4444).set_trace()

I recommend the next thing that we investigate is a better way to get a reference or look into why this one crashes more and others don't. It honestly seems random since all the things set in this function look sane. The function type gets changed to long and the args get new names. That is all. Something wonky is going on with references, I think.

Worst case scenario: we may need to disable magic sync for a bit :(, since the issue is not reproducible when doing only a single sync. Maybe look into closing code views?

Last Known Debug Version

arizvisa commented 1 year ago

just guessing here, but it seems like the only thing that happens between the two calls to refresh_pseudocode_view is: https://github.com/binsync/binsync/blob/d50084e387412010066cf698b8f3debf008cb712/plugins/ida_binsync/ida_binsync/compat.py#L278-L280

this api call might be accessing the vdui_t.cfuncptr and related stuff after it might've gone out of scope due to vdui_t.refresh_view.

Python>v.cfunc, v.cfunc.lvars
(<ida_hexrays.cfuncptr_t; proxy of <Swig Object of type 'qrefcnt_t< cfunc_t > *' at 0x7faaef72d9b0> >, <ida_hexrays.lvars_t; proxy of <Swig Object of type 'lvars_t *' at 0x7faaef72d680> >)

Python>v.refresh_view(False)

Python>v.cfunc, v.cfunc.lvars
(<ida_hexrays.cfuncptr_t; proxy of <Swig Object of type 'qrefcnt_t< cfunc_t > *' at 0x7faaef72da70> >, <ida_hexrays.lvars_t; proxy of <Swig Object of type 'lvars_t *' at 0x7faaef72db90> >)

if you have the variable name and the address already, you can locate it by name or w/ vdui using ida_hexrays.locate_lvar in order to rename it or change its types with something like:

        @utils.multicase(ea=six.integer_types, lvar=ida_hexrays.lvar_t, string=(tuple, types.string))
        @classmethod
        def name(cls, ea, lvar, string):
            '''rename `lvar` to `string`'''
            lvarinfo = ida_hexrays.lvar_saved_info_t()
            if not ida_hexrays.locate_lvar(lvarinfo.ll, ea, lvar.name):
                raise exceptions.DisassemblerError

            res, lvarinfo.name = lvar.name, string

            if not ida_hexrays.modify_user_lvar_info(ea, ida_hexrays.MLI_NAME, lvarinfo):
                raise exceptions.DisassemblerError
            return utils.string.of(res)

        @utils.multicase(ea=six.integer_types, lvar=ida_hexrays.lvar_t, info=idaapi.tinfo_t)
        @classmethod
        def type(cls, ea, lvar, info):
            '''modify type of `lvar` to `info`'''
            if not lvar.accepts_type(info, False):
                raise ValueError("type ({:s}) does not fit".format("{!s}".format(info)))

            lvarinfo = ida_hexrays.lvar_saved_info_t()
            if not ida_hexrays.locate_lvar(lvarinfo.ll, ea, lvar.name):
                raise exceptions.DisassemblerError

            res, lvarinfo.type = lvar.type(), info

            if not ida_hexrays.modify_user_lvar_info(ea, ida_hexrays.MLI_TYPE, lvarinfo):
                raise exceptions.DisassemblerError
            return res

since the refresh_pseudocode_view func seems to be being used here for updating all of the views with any name or type changes, if possible, it might be safer to pull the refresh_pseudocode_view out of the for-loop that it's being used in. this way you do all the modifications to the the lvars first, and then refresh once afterwards.

mahaloz commented 1 year ago

@arizvisa So get this, I fixed one segfault be removing extra referenced to cfunc when accessing a vu ptr. This line for instance: https://github.com/binsync/binsync/blob/d50084e387412010066cf698b8f3debf008cb712/plugins/ida_binsync/ida_binsync/compat.py#L235

Creates a new cptr to cfunc (different address), which, when updating it, causes a segfualt because I assume its a UAF of some sort. I fixed it by removing all references that are created from the original object, and instead just used the object.

I've now hit an even more obscure crash. Since discovering the first bug I pretty much have only been using vu.refresh_view directly, instead of iterating all views.

If you go here: https://github.com/binsync/binsync/blob/d50084e387412010066cf698b8f3debf008cb712/plugins/ida_binsync/ida_binsync/controller.py#L225

This will get you a pointer to a vu of the current open window. All you need to do is just refresh_view(True), with no edits to the view, and boom, crash.

mahaloz commented 1 year ago

The minimum reproducible actions seems to be:

vu = idaapi.open_pseudocode(15654, 1)
vu.refresh_view(True)

But the crash only triggers after doing a shit ton of edits to other functions.

arizvisa commented 1 year ago

I'll try and examine closer in detail tomorrow (it's like midnight here), but I'd definitely verify your notifications/hooks if those are the the only actions that are causing it.

My latest refactor for my own proj is titled "persistence-refactor", because a ton of Hex-Rays types can go out-of-scope at the same time the user is using them, mostly things off of that vu object.. but even mblock_t and other things from the microcode. I've also had to differentiate between cfuncptr_t and cfunc_t as you can treat them the same, but the name points out that they're referenced differently.

arizvisa commented 1 year ago

Okay, had a chance to spend some time on it today. Didn't figure it out entirely...and ironically it took more time writing this down, heh.

Just so we're (sorta) on the same page, I'm running on 8.0.220729/Linux. After triggering the original crash, I noticed that the sync was pretty much only crashing on functions that I currently had open in another view within the decompiler. So, I rewrote both of the following functions so that the vu.refresh_view is moved outside the loop which guarantees that they're only called once per func_addr as perhaps that'd reduce the risk of a view being refreshed more than once in succession for the same function address.

https://github.com/binsync/binsync/blob/d50084e387412010066cf698b8f3debf008cb712/plugins/ida_binsync/ida_binsync/compat.py#L698 https://github.com/binsync/binsync/blob/d50084e387412010066cf698b8f3debf008cb712/plugins/ida_binsync/ida_binsync/hooks.py#L632

I also tweaked this next method so that it consolidates accessing that vdui_t.cfunc_t from the ida_func_code_view arg as per the original discussion. I didn't want to rewrite it, but just packed the related logic closer together in order to check if it was the culprit. https://github.com/binsync/binsync/blob/d50084e387412010066cf698b8f3debf008cb712/plugins/ida_binsync/ida_binsync/compat.py#L233

Also added just some ways to notify which hooks are being called and when (just to be certain the issues are not hook-related).

def f(descr):
 def c(*args):
  print('got', descr, args)
 return c
import hook
hook.hx.add(ida_hexrays.hxe_print_func, f('hxe_print_func'))
hook.hx.add(ida_hexrays.hxe_func_printed, f('hxe_func_printed'))
hook.hexrays.add('refresh_pseudocode', f('refresh_pseudocode'))
hook.hexrays.add('switch_pseudocode', f('switch_pseudocode'))
hook.hexrays.add('open_pseudocode', f('open_pseudocode'))
hook.hexrays.add('close_pseudocode', f('close_pseudocode'))

Essentially these are my logs for the INTERRs that I got during the sync. This just confirms your "obscure crash" that acquire_pseudocode_vdui seems to trigger in-between accessing the microcode from the idb and invalidating it.

INFO | 2023-01-02 16:31:58,836 | binsync.common.controller | Successfully synced new changes from user for <Function: None read_0(args=0); @0x2448 vars=2 len=0x1f6>
acquire_pseudocode_vdui
265E: restored microcode from idb
265E: INTERR 50719
265E: deleted stale pseudocode from idb
got switch_pseudocode (<ida_hexrays.vdui_t; proxy of <Swig Object of type 'vdui_t *' at 0x7f87f11ba2e0> >,)
got hxe_print_func (<ida_hexrays.cfunc_t; proxy of <Swig Object of type 'cfunc_t *' at 0x7f87f11ba160> >, <ida_hexrays.vc_printer_t; proxy of <Swig Object of type 'vc_printer_t *' at 0x7f87f11ba250> >)
got hxe_func_printed (<ida_hexrays.cfunc_t; proxy of <Swig Object of type 'cfunc_t *' at 0x7f87f11ba010> >,)
got refresh_pseudocode (<ida_hexrays.vdui_t; proxy of <Swig Object of type 'vdui_t *' at 0x7f87f11ba2e0> >,)
acquire_pseudocode_vdui(done)
set_function_header(vdui,access)
set_function_header(vdui,we,are,good)
...
INFO | 2023-01-02 16:32:12,426 | binsync.common.controller | Successfully synced new changes from user for <Function: None main_state_set_rootpath(args=0); @0x15762 vars=0 len=0x1f>
acquire_pseudocode_vdui
3966: restored microcode from idb
3966: INTERR 50719
3966: deleted stale pseudocode from idb
got switch_pseudocode (<ida_hexrays.vdui_t; proxy of <Swig Object of type 'vdui_t *' at 0x7f87f1550240> >,)
got hxe_print_func (<ida_hexrays.cfunc_t; proxy of <Swig Object of type 'cfunc_t *' at 0x7f87f156f6f0> >, <ida_hexrays.vc_printer_t; proxy of <Swig Object of type 'vc_printer_t *' at 0x7f87f1550570> >)
got hxe_func_printed (<ida_hexrays.cfunc_t; proxy of <Swig Object of type 'cfunc_t *' at 0x7f87f156f6f0> >,)
got refresh_pseudocode (<ida_hexrays.vdui_t; proxy of <Swig Object of type 'vdui_t *' at 0x7f87f156f6f0> >,)
acquire_pseudocode_vdui(done)
set_function_header(vdui,access)
set_function_header(vdui,we,are,good)
got hxe_print_func (<ida_hexrays.cfunc_t; proxy of <Swig Object of type 'cfunc_t *' at 0x7f87f1550e40> >, <ida_hexrays.vc_printer_t; proxy of <Swig Object of type 'vc_printer_t *' at 0x7f87f1551470> >)
got hxe_func_printed (<ida_hexrays.cfunc_t; proxy of <Swig Object of type 'cfunc_t *' at 0x7f87f1550ff0> >,)
got hxe_print_func (<ida_hexrays.cfunc_t; proxy of <Swig Object of type 'cfunc_t *' at 0x7f87f1550e40> >,
...
INFO | 2023-01-02 16:32:13,864 | binsync.common.controller | Successfully synced new changes from user for <Function: None constr_11(args=0); @0x8975 vars=0 len=0x12>
acquire_pseudocode_vdui
297C: restored microcode from idb
297C: INTERR 50719
297C: deleted stale pseudocode from idb
got switch_pseudocode (<ida_hexrays.vdui_t; proxy of <Swig Object of type 'vdui_t *' at 0x7f87f1551d40> >,)
got hxe_print_func (<ida_hexrays.cfunc_t; proxy of <Swig Object of type 'cfunc_t *' at 0x7f87f1551c20> >, <ida_hexrays.vc_printer_t; proxy of <Swig Object of type 'vc_printer_t *' at 0x7f87f15507e0> >)
got hxe_func_printed (<ida_hexrays.cfunc_t; proxy of <Swig Object of type 'cfunc_t *' at 0x7f87f15519e0> >,)
got refresh_pseudocode (<ida_hexrays.vdui_t; proxy of <Swig Object of type 'vdui_t *' at 0x7f87f1551d40> >,)
acquire_pseudocode_vdui(done)
set_function_header(vdui,access)
set_function_header(vdui,we,are,good)
...
INFO | 2023-01-02 16:32:18,547 | binsync.common.controller | Successfully synced new changes from user for <Function: long sub_39BA(args=1); @0x39ba vars=1 len=0xba>
acquire_pseudocode_vdui
39BA: INTERR 50893
3FC2: deleted stale microcode from idb
3FC2: INTERR 51551
INFO | 2023-01-02 16:32:18,747 | binsync.common.controller | Banishing exception: Decompilation failed: 3fc2: INTERR: 51551
acquire_pseudocode_vdui
87C4: restored microcode from idb
87C4: restored pseudocode from idb
got switch_pseudocode (<ida_hexrays.vdui_t; proxy of <Swig Object of type 'vdui_t *' at 0x7f87f15ddbf0> >,)
got hxe_print_func (<ida_hexrays.cfunc_t; proxy of <Swig Object of type 'cfunc_t *' at 0x7f87f15ddb30> >, <ida_hexrays.vc_printer_t; proxy of <Swig Object of type 'vc_printer_t *' at 0x7f87f15dd5c0> >)
got hxe_func_printed (<ida_hexrays.cfunc_t; proxy of <Swig Object of type 'cfunc_t *' at 0x7f87f15dd4d0> >,)
got refresh_pseudocode (<ida_hexrays.vdui_t; proxy of <Swig Object of type 'vdui_t *' at 0x7f87f15ddbf0> >,)
acquire_pseudocode_vdui(done)
set_function_header(vdui,access)
set_function_header(vdui,we,are,good)
...
INFO | 2023-01-02 16:32:18,547 | binsync.common.controller | Successfully synced new changes from user for <Function: long sub_39BA(args=1); @0x39ba vars=1 len=0xba>
acquire_pseudocode_vdui
39BA: INTERR 50893
3FC2: deleted stale microcode from idb
3FC2: INTERR 51551
INFO | 2023-01-02 16:32:18,747 | binsync.common.controller | Banishing exception: Decompilation failed: 3fc2: INTERR: 51551
acquire_pseudocode_vdui
87C4: restored microcode from idb
87C4: restored pseudocode from idb
got switch_pseudocode (<ida_hexrays.vdui_t; proxy of <Swig Object of type 'vdui_t *' at 0x7f87f15ddbf0> >,)
got hxe_print_func (<ida_hexrays.cfunc_t; proxy of <Swig Object of type 'cfunc_t *' at 0x7f87f15ddb30> >, <ida_hexrays.vc_printer_t; proxy of <Swig Object of type 'vc_printer_t *' at 0x7f87f15dd5c0> >)
got hxe_func_printed (<ida_hexrays.cfunc_t; proxy of <Swig Object of type 'cfunc_t *' at 0x7f87f15dd4d0> >,)
got refresh_pseudocode (<ida_hexrays.vdui_t; proxy of <Swig Object of type 'vdui_t *' at 0x7f87f15ddbf0> >,)
acquire_pseudocode_vdui(done)
set_function_header(vdui,access)
set_function_header(vdui,we,are,good)

According to https://hex-rays.com/products/decompiler/manual/sdk/structvdui__t.shtml#a44e1923f1e01bed28bf52f862520052d, this might only happen when its parameter is True. Also, refresh_ctext claims it refreshes all open pseudocode windows for an address. I'm not sure if refresh_view depends on it, but you might be able to avoid iterating through all pseudocode windows since you only need to reference them by address for a refresh.

So, it doesn't seem to be related to events/hooks at all like I suggested...and I've actually had this error happen outside of magic-sync in between testing. If you end up also encountering it outside magic-sync, I'd consider triaging it as a different issue...and maybe you can ping Hexrays on these error codes. They're usually pretty quick to respond.

mahaloz commented 1 year ago

Update

arizvisa commented 1 year ago

Not sure if it's the same crash that I'm getting, as since I've upgraded my system over the weekend Qt has been a little unstable...or at least the Lucid plugin (consistently) crashes w/ a null deref.

Anyways, for me magic sync is crashing in the middle of an ida_hexrays.decompile_func while trying to access type info via get_tinfo_size. It's happening in response to an ida_kernwin.execute_sync call as per another thread. If it's the same one that you're encountering, it could be one of those functions decorated with execute_write. I'll keep an eye out for the new issue, though.

mahaloz commented 1 year ago

Hmmm that's a different crash than I had before... I'll have to check again. As far as execute_sync goes, I thought the whole point of those functions is that they can be called from many threads to queue for access on the Mainthread. Am I interpreting this wrong?

------- Original Message ------- On Monday, February 13th, 2023 at 12:45 PM, Ali Rizvi-Santiago @.***> wrote:

Not sure if it's the same crash that I'm getting, as since I've upgraded my system over the weekend Qt has been a little unstable...or at least the Lucid plugin crashes w/ a null deref.

Anyways, for me magic sync is crashing in the middle of an ida_hexrays.decompile_func while trying to access type info via get_tinfo_size. It's happening in response to an ida_kernwin.execute_sync call as per another thread. If it's the same one that you're encountering, it could be one of those functions decorated with execute_write. I'll keep an eye out for the new issue, though.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

arizvisa commented 1 year ago

I thought the whole point of those functions is that they can be called from many threads to queue for access on the Mainthread. Am I interpreting this wrong?

Nah, you're correct.

The path that my system crashed on was from execute_sync which was only called from functions decorated by @execute_write. So for me, this following ida_hexrays.decompile was the only candidate that matched what my crash was showing (I'm also on linux).

@execute_write
def get_func_stack_var_info(func_addr) -> typing.Dict[int, StackVariable]:
    try:
        decompilation = ida_hexrays.decompile(func_addr)
    except ida_hexrays.DecompilationFailure:
...

I need to upgrade to IDA 8.2, though, as I'm still on 8.0. So this specific crash might be bunk (because y'know, software sucks). In your next issue, though, let me know what crash you're getting and I'll see if I can repro on my end.

mahaloz commented 1 year ago

This bug has brought me nothing but pain. So I discovered something interesting. If you go to this line: https://github.com/binsync/binsync/blob/603ca75a303212b7eeb16b9645650822fe8f4a7a/binsync/common/controller.py#L825

Then add the following:

                if isinstance(pref_art, Function) and pref_art.name != "sub_3D26":
                    continue

The bug can be reproduced in such a way that IDA does not segfault out, but it has internal errors that try to force you to send an email to HexRays. What this little change in code does is avoid syncing any functions that are not the one function I know crashes while also allowing structs. What I've sussed out from this is that structs being synced plus this function is causing the crash... which just makes no sense.

mahaloz commented 1 year ago

@arizvisa I've found the bug all the way at the bottom... and it's a little wild. I thought you'd care about it since it's so strange and funny. This is only relevant for 8.0 and below since 8.2 IDA Pro seems to have fixed it.

PoC

To reproduce database corruption in IDA first download this nivisor_crash.zip zip file. The ziped folder contains the nivisor binary and the poc script poc.py. The script does not require you to have anything installed. It's only requirement is that it's running in IDA (no BinSync needed).

Now do the following precisely:

  1. Open the binary nivisor in IDA, don't change the function IDA is focused on by default
  2. Run the script file poc.py
  3. You should now be looking at the decompilation of start
  4. Hit F5 (or just redecompile the current function).
  5. Close IDA, pack and don't save database
  6. Your database is now corrupted (should have a popup).

For reference, all the script does is set the following struct in IDA:

    bad_struct = {
        "name": "gcc_va_list",
        "size": 16,
        "members": {
            0: {
                "member_name": "gp_offset",
                "offset": 0,
                "type": "unsigned int",
                "size": 4
            },
            4: {
                "member_name": "fp_offset",
                "offset": 4,
                "type": "unsigned int",
                "size": 4
            },
            8: {
                "member_name": "overflow_arg_area",
                "offset": 8,
                "type": "",
                "size": 8
            },
        }
    }

The Fix

For now, it's just a hotfix that is dumb in #236. Never set a struct with this name.

arizvisa commented 1 year ago

omfg, what. that's super weird. i just upgraded to 8.2 also, so i'm unable to test on anything else in the 8.x series (other than an old 7.4 running Py2). heh.

so i guess the issue is that the specific type is defined as:

Python>db.types.list('*list*')
[6] +0x18 : L--S : __va_list_tag : struct {unsigned int gp_offset;unsigned int fp_offset;void *overflow_arg_area;void *reg_save_area;}
[7] +0x18 : L--A : gcc_va_list   : __va_list_tag[1]

and since this type is being chosen and used by the decompiler, when it gets modified while the decompiler has it referecned...shit just hits the fan i guess?

some observations after running your script is that the type changes so that it autosync's to the newly created structure in the database. i guess before running the script the gcc_va_list type is referencing _va_list_tag by its local types ordinal, and when it gets made into a structure by the script one part of the decompiler is referencing __va_list_tag and if it accesses gcc_va_list it is now a different type w/ a different size.

if the issue manifests itself again and ends up being related to autosync, you can check to see if a specific type is autosync'd using its name with ida_typeinf.is_autosync($strucname, $structinfo) and check for the existence of a type that your structure name might collide with using ida_typeinf.get_type_ordinal(idaapi.get_idati(), $name) > 0. (I've just been doing a lot of typeinfo-related stuff lately).

glad you were able to get it narrowed down.