binsync / libbs

A library for writing plugins in any decompiler: includes API lifting, common data formatting, and GUI abstraction!
BSD 2-Clause "Simplified" License
63 stars 4 forks source link

Feat: Support Typedefs #98

Closed mahaloz closed 1 month ago

mahaloz commented 1 month ago

TODO

Binja Incomplete

Until the problems associated with https://github.com/Vector35/binaryninja-api/issues/4552 are fixed, typedefs that reference primitive types wont work.

mahaloz commented 1 month ago

Discovered a segfault in IDA when I run this code:

for ord_num in range(ida_typeinf.get_ordinal_qty(idati)):
    tif = ida_typeinf.tinfo_t()
    success = tif.get_numbered_type(idati, ord_num)
    if not success:
        continue

    if not tif.is_typeref():
        continue

    name = tif.get_type_name()
    type_name = tif.get_next_type_name()
    if not name:
        continue

    if type_name is None:
        # try again, but with ordinal
        backup_tif = ida_typeinf.tinfo_t()
        backup_tif.get_named_type(idaapi.get_idati(), name, ida_typeinf.BTF_TYPEDEF, True, True)
        real_type_val = backup_tif.get_realtype()
        try:
            real_type = ida_typeinf.tinfo_t(real_type_val)
        except Exception:
            continue

        type_name = str(real_type)

    if not name or not type_name or name == type_name:
        continue

All I'm actually trying to do is grab the name of the base type a typedef is referencing. For instance typedef int my_int should return int.

Maybe @arizvisa knows the real solution or why this segfaults? Am I not allowed to just do this. Tested on binary https://github.com/binsync/libbs/blob/main/tests/binaries/fauxware

for reference tif.get_next_type_name() almost always is empty.

arizvisa commented 1 month ago

Discovered a segfault in IDA when I run this code:

This is crashing because typedefs are actually complex types. Complex types aren't a basic type, instead they're represented by a bytestream following the conditions at https://hex-rays.com//products/ida/support/sdkdoc/group__tf__complex.html#ga86c5e589737e005ba4741423fd2ca5c6.

        # try again, but with ordinal
        backup_tif = ida_typeinf.tinfo_t()
        backup_tif.get_named_type(idaapi.get_idati(), name, ida_typeinf.BTF_TYPEDEF, True, True)
        real_type_val = backup_tif.get_realtype()
        try:
            real_type = ida_typeinf.tinfo_t(real_type_val)
        except Exception:
            continue

        type_name = str(real_type)

So, here you're creating a BTF_TYPEDEF, getting its "realtype" in real_type_val (which is always going to be BT_COMPLEX), and then constructing a tinfo_t with the "real type" result. However, it's relevant to note that The byte for each type is actually a combination of TYPE_MODIF_MASK(0xC0), TYPE_FLAGS_MASK(0x30), and TYPE_BASE_MASK(0x0F). So, the issue is that tinfo_t.get_realtype only returns the TYPE_BASE_MASK, or the lower 4 bits of the type, and completely excludes the flags and modifier bits. When you construct real_type from real_type_val, you're also excluding the rest of the information (the actual reference) that composes backup_tif.

Rendering real_type to a string causes IDA to attempt to decode its first byte, sees it's BT_COMPLEX, and then expects another byte defining what "kind" of complex type it is, and then decoding OOB (pretty sure, anyways). The truth behind tinfo_t is really exposed via the tinfo_t.serialize() method. Typerefs are literally BT_COMPLEX|BTMT_TYPEDEF, followed by a byte representing whether to use a string or ordinal, then followed by the actual name/ordinal. This is the difference between a "named type" and a "numbered type".

for reference tif.get_next_type_name() almost always is empty.

What you actually need to do to get the target of a typeref is to distinguish whether the type is a named or ordinal (numbered) type. Once you have that, then you use get_type_name or get_ordinal (respectively) to get the target type. Then use that result to query the type library via get_named_type or get_numbered_type (respectively).

In later versions of IDA, you can use replace_ordinal_typerefs to always convert types to a named type..with the downside being that looking into the typed library by name is slower (algorithmically) than by ordinal. I'm using https://github.com/arizvisa/ida-minsc/blob/persistence-refactor/misc/interface.py#L5428 to always get an ordinal irregardless of the type being numbered or named.

All I'm actually trying to do is grab the name of the base type a typedef is referencing. For instance typedef int my_int should return int.

Since your example is always using get_numbered_type, you can assume that you'll always be dealing with a numbered type and you'll only need to use tinfo_t.get_ordinal or tinfo_t.get_final_ordinal to figure out what type to look up next in the type library (https://github.com/arizvisa/ida-minsc/blob/persistence-refactor/misc/interface.py#L6185).

        # try again, but with ordinal
        backup_tif = ida_typeinf.tinfo_t()
        backup_tif.get_named_type(idaapi.get_idati(), name, ida_typeinf.BTF_TYPEDEF, 

Also, it's worth checking the result of get_numbered_type and get_named_type. They return booleans that tell you whether they've successfully modified the tinfo_t you give it. If you have a tinfo_t that you're unsure of (when serializing/deserializing them) I've been using tinfo_t.get_size as a smoke-test to ensure that the type isn't damaged (tinfo_t.get_size() != BADSIZE). Maybe there's a proper way to do this, but this way doesn't seem to crash for me.

mahaloz commented 1 month ago

What you actually need to do to get the target of a typeref is to distinguish whether the type is a named or ordinal

That was actually why I was trying to use just the normal type construction of IDA with the info_t, but yeah that segfaults. Is there another way, given a tinfo_t, to find out if it is an ordinal type?

arizvisa commented 1 month ago

What you actually need to do to get the target of a typeref is to distinguish whether the type is a named or ordinal

That was actually why I was trying to use just the normal type construction of IDA with the info_t, but yeah that segfaults. Is there another way, given a tinfo_t, to find out if it is an ordinal type?

Maybe you can check if tinfo_t.get_ordinal returns > 0, and then confirm that it's within bounds of get_ordinal_qty?

mahaloz commented 1 month ago

Maybe you can check if tinfo_t.get_ordinal

That gives the original of the typedef type, but not the type it points too. I haven't yet found a way to get a reference to the base type without get_realtype

arizvisa commented 1 month ago

Ah, my bad.

Python>db.types.add('fuck', 'typedef int fuck')
fuck

# listing to make sure it was actually added.
Python>db.types.list(index=range(idaapi.get_ordinal_qty(None)-0x5, 0x100))
[82] +0x10 : L--S : $::$41B0E947727B04B281BBEA4B1896A8BE::$4B29161E04CAD4BCDD788B201A5E8E5E : struct {void *_call_addr;int _syscall;unsigned int _arch;}
[83]     ? : ?T-- : std::nothrow_t                                                          : struct 
[84]     ? : ?T-- : std::__detail::_Prime_rehash_policy                                     : struct 
[85]     ? : ?T-- : std::exception                                                          : struct 
[86]  +0x4 : L-I- : fuck                                                                    : int

Python>db.types.by('fuck')
fuck
Python>type(db.types.by('fuck'))
<class 'ida_typeinf.tinfo_t'>

# this just gets the ordinal of the type that was added
Python>db.types.ordinal('fuck')
0x56

# this is the goal
Python>db.types.get(0x56)
int

## now back to ida-speak

# serialize type out of the type library
Python>ti=idaapi.tinfo_t()
Python>idaapi.get_numbered_type(None, 0x56)
(b'\x07', None, None, None, 0x0)

# deserialize it back into a type (the output from get_numbered_type/get_named_type may have different types than tinfo_t.deserialize)
Python>ti.deserialize(idaapi.get_idati(), b'\x07', None or b'', None or b'')
True

Python>ti
int

So you need to deserialize your tinfo_t from whatever get_numbered_type/get_named_type returns.

(edited) ...and for get_named_type.

Python>idaapi.get_named_type(None, 'fuck', idaapi.NTF_TYPE)
(0x1, b'\x07', None, None, None, 0x0, 0x56)

Python>ti.deserialize(idaapi.get_idati(), b'\x07', b'', b'')
True
Python>ti
int

(edited again) using tinfo_t.get_ordinal to get the input for get_numbred_type

Python>ti = idaapi.tinfo_t()
Python>ti.get_numbered_type(None, 0x56)
True
Python>ti
fuck

Flushing buffers, please wait...ok
Python>ti.get_ordinal()
0x56

using tinfo_t.get_name to get the input for get_named_type.

Python>ti = idaapi.tinfo_t()
Python>ti.get_named_type(None, 'fuck')
True
Python>ti
fuck

Python>ti.get_type_name()
'fuck'