NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
49.06k stars 5.65k forks source link

Data structures allowing variable sized member arrays #6660

Open Wall-AF opened 6 days ago

Wall-AF commented 6 days ago

Is your feature request related to a problem? Please describe. When structures can be defined as having their (last) element(s) as a variable size array(s), as in the simple case of the struct type_info used to represent the C++ class, assigning those types to memory in the data segment is incomplete as invariably the dataype manager only allows for a static definition of your type. Illustrating this using the type_info, the structure is defined as:

struct type_info {
    type_info_vtable _m_vtable;
    void *           _m_data;
    char             _m_d_name[1];
};

meaning that when it is assigned to some data where the name contains 17 elements (chars in this case), you see something akin to:

    1000b030 5c a0 00                   type_info
             10 00 00 
             00 00 2e 
           1000b030 5c a0 00 10     type_inf  type_info_vftable       _vftable                          XREF[1]:     1000acb4(*)  
           1000b034 00 00 00 00     void *    00000000                _m_data
           1000b038 2e 3f 41 56     char[1]   "."                     _m_d_name
              1000b038 [0]            '.'
    1000b03c 62                         ??         62h    b
    1000b03d 61                         ??         61h    a
    1000b03e 64                         ??         64h    d
    1000b03f 5f                         ??         5Fh    _
    1000b040 74                         ??         74h    t
    1000b041 79                         ??         79h    y
    1000b042 70                         ??         70h    p
    1000b043 65                         ??         65h    e
    1000b044 69                         ??         69h    i
    1000b045 64                         ??         64h    d
    1000b046 40                         ??         40h    @
    1000b047 40                         ??         40h    @
    1000b048 00                         ??         00h

Describe the solution you'd like With the ability to parameterise the number of elements in each array within the datatype, the same type would be usable in multiple places which is more appropriate. For the example above you would then see:

    1000b030 5c a0 00                   type_info
             10 00 00 
             00 00 2e 
           1000b030 5c a0 00 10     type_inf  type_info_vftable       _vftable                          XREF[1]:     1000acb4(*)  
           1000b034 00 00 00 00     void *    00000000                _m_data
           1000b038 2e 3f 41 56 62  char[17]  ".?AVbad_typeid@@"      _m_d_name
                    61 64 5f 74 79 
                    70 65 69 64 40...

Notice that the 3 characters ?AV are completely missing from the initial data display due to only a single character being shown and data between 1000b039-1000b03b being ignored.

Describe alternatives you've considered Using size of 0 for the member: but this breaks other dissasembly as references to the zero'd member become references to the next element of an array, of in this case type_info structures, which don't exist as it is a single instance; also you then can have the correctly sized array of elements but they become disjoint from their owner. Using size of 1 for the member: this solves the dissaasembly issue above, but you're left with the example illustrated in this request. Using multiple copies of the datatype: this has consistancy problems etc.

Wall-AF commented 6 days ago

Changing the packing of the structure solved the problem of the 3 missing charaacters ?AV!

ghidra1 commented 4 days ago

A trailing flexable array member is expected to be declared with a 0-element count. In addition, it it best to enable packing on the structure when this is done. Any references to the member will be treated as a reference beyond the structure bounds. Below is an example which shows both the listing and structure editor for a similar case: image

ghidra1 commented 4 days ago

The decompiler will not render as a reference the last zero-length structure member (e.g., name) since its offset falls outside the bounds of the structure. It would require special logic within the decompiler to recognize as a structure member access.

Wall-AF commented 4 days ago

The decompiler will not render as a reference the last zero-length structure member (e.g., name) since its offset falls outside the bounds of the structure. It would require special logic within the decompiler to recognize as a structure member access.

That is the point (for this simple case), I believe some kind of speciality could be used to ensure that the decompiler could see that an array is specified and (maybe give a user option to) reference that member.

Alternatively, use a size of 1 in the type definition to enable the decompiler to see the member and reference that, and add a per instance attribute to provide the actual size of the array member based upon its deployed location.