NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
49.77k stars 5.72k forks source link

Enhancements to structure editor (to support inheritance) #3772

Open Wall-AF opened 2 years ago

Wall-AF commented 2 years ago

Is your feature request related to a problem? Please describe. Keeping track and modifying inherited structures is difficult (not really supported) in the current situation.

Describe the solution you'd like There are three new ideas for this which should not interfere with current usage for the languages if left blank: 1) add two new fields to the structure editor; one for the parent structure name and one for the number of fields at the beginning of the structure to be ignored by the parent. 2) when editing the structure, ensure that parent fields that are modified by the user are modified in the parent structure 3) when renaming the structure, either in the "Structure Editor" or in the "Data Type Manager" rename the current classname if one exists.

Describe alternatives you've considered The alternative which I have been using is to insert the detail in the comment section of the Structure Editor. However, when manually analysing an application, structure names are not always know when substructures are created, and if you have several, their management becomes overcomplicated.

Additional context In item 1 above, the reason for having the second field is so that things like "vtables" (which are usually stored first in the structure) can be overridden by the user.

Wall-AF commented 2 years ago

If all the above isn't possible, could we at least have the ability to copy'n'paste (with overwrite capability) between structures? (It is possible to "Create Structure From Selection"!) And maybe an additional field to add a trackable parent structures name???

Wyliemaster commented 1 year ago

Has there been any progress made on this? Or is it still just a suggestion?

Wall-AF commented 1 year ago

Hi @ryanmkurtz, is this (or something better) going to be considered anytime soon? I'm beginning to need it more and more!

RootCubed commented 1 year ago

I would like to propose an alternative implementation for this feature request.

The idea is the addition of a "Derived Structure" data type, which is essentially a regular structure with an existing structure prepended to it, plus a set of changes to the existing structure. This is how I'm envisioning it:

In the Data Type Manager,

In the Structure Editor, if a derived structure is being edited:

Internally, a derived structure would simply be a small extension to the regular structure class; namely the addition of a reference to the source data type and an array of (<component name>, <new data type>) tuples. I'm not familiar with the internals of how datatypes are passed between the the GUI and the decompiler, but I imagine a derived structure could easily be able to generate a "normal" structure instance that gets passed forwards to where it's needed.

One tricky situation is when the source data type of a derived structure changes:

A new edge case is of note here: If the derived structure has an additional component field and the original structure has a component fd, this component is not allowed to be renamed to field, because it would collide with the component in the derived structure, so some additional checking would need to be done there.

I have included some mockups of how this feature could look like:

Original structure BaseClass<T>: nonderived

Derived structure 1 BaseClass<int>, derived from BaseClass<T> (orange = changed from source data type, green = unchanged): derived1

Derived structure 2 SubClass<int>, derived from BaseClass<int>: derived2

Wall-AF commented 1 year ago

@RootCubed nice!

ghidra007 commented 1 year ago

Thank you for the suggestions. We are aware of the many limitations of Ghidra's current modeling of classes and are working on a better model. It is a very hard problem that will require a lot of changes to the Ghidra framework which is why it is taking so long.

In my RTTI scripts I have temporarily solved the vftablePtr issue in the structures by splitting out the parent structure members inside the class structure instead of simply adding the parent structure as a member. This isn't ideal but the correct vftablePtr can then overwrite its associated parent one. To save some of the knowledge of what belongs to what parent class I create _data structures for the parent data members and add them as a chunk to the class structure. The class's data members remain individual members. Also, I put the class parent names (and how inherited) in the class structure description field.

Regarding copying from a current structure and making a new structure, that feature exists right now. You just select rows in a current structure (within the structure editor) and right mouse and choose Create Structure From Selection.

Wall-AF commented 1 year ago

@ghidra007 could you give us a guestimate of the amount of time you wondeful guys are able to allocate say in a month to your solution and a rough guide as to how much work you think it may take?

RootCubed commented 1 year ago

Glad to hear it's being worked on!

I agree that separating the data from the vftable is usually a good enough workaround. However, there is at least one compiler/ABI I know of (CodeWarrior) that does not always place the vftable at offset 0, but rather at the point where the first virtual method is declared (i.e. there might be some member variables before the vftable). This can make it quite annoying because then you basically have

struct cls {
  cls__data0 data0;
  cls__vftable vt;
  cls__data1 data1;
}

and once you come across a subclass of such a class, it quickly becomes a nightmare to work with :') Therefore, being able to create these inherited structures would be incredibly useful, not only for templated classes. I hope this use case can be considered for the reworked structure model.

ghidra007 commented 1 year ago

Yes we are aware there are use cases where the first vftable offset is not at 0. There are cases where there are multiple vftable offsets associated with multiple parents. There are cases where one vftable is associated with the derived class and the other is associated with the parent. We hope to handle all the cases with whatever model we come up with. You are right it is a nightmare to deal with. It is a very hard problem to model correctly.

ghidra007 commented 1 year ago

@ghidra007 could you give us a guestimate of the amount of time you wondeful guys are able to allocate say in a month to your solution and a rough guide as to how much work you think it may take?

I'm sorry but it is against project policy to share internal project planning, timelines, roadmaps, etc. However we are actively working on class modeling solutions that we hope will solve your current issues.

Wall-AF commented 1 year ago

@ghidra007 could you give us a guestimate of the amount of time you wondeful guys are able to allocate say in a month to your solution and a rough guide as to how much work you think it may take?

I'm sorry but it is against project policy to share internal project planning, timelines, roadmaps, etc. However we are actively working on class modeling solutions that we hope will solve your current issues.

Thanks for the work you do anyway. Just wish there were better docs so us mere mortals could help in a more direct way...

tamlin-mike commented 2 months ago

As this is a problem I've been fighting with for literally decades in "the other" serious R.E. tool, and I am still appalled at the mere thought of creating and/or extending vtables for subclasses, I figured I might as well share some insights hopefully useful.

For the sake of sanity, I only refer to single inheritance here.

I don't know what came first, the chicken or the egg, but the inheritance-specifying relationship between two C++ data types with virtual functions, let's call them A (inheriting nothing) and B (inheriting A) is the exact same inheritance as A::v(f?)table to B::v(f?)table, only that the v(f?)table types are:

As for the concrete data types A and B, the vtable pointer, whether it's placed first (which btw is a requirement for Windows ABI - COM is based on this) or placed willy-nilly as apparently Codewarrior does it, still adhere to these rules.

Sometimes (very infrequently IME, where I define "infrequently" in this context to be less than one in ten thousand) you need to see spelled out "is this m_foo part of this type, or is it held by a parent type). An easy cop-out could be to have only a global flag "Always qualify parent member field access". While that setting needs to be present anyway (really, it does), this could be an "annotation" flag to turn on/off for

But the most important parts, without question, is the ability to state

and the obvious; Ghidra needs to know both what a vtable and the concept of covariance is. :-)

Now for the less obvious problems: When you rev. a type, frequently you don't initially know its exact size. You may start reving type B, only to later find out "Damn, it's inheriting from type A", but by that time you have already crated a large amount of data members (fields) where e.g. the first half really belongs to A, and the latter half to B itself. This requires functionality to atomically move (leading) data from B to A, and set B as inhering A.

That's in context easy.

The less obvious part comes when you also have types C, D and E, all inheriting A, and you realize "Oh, these fields at same offset after the vfptr called B::m_data, C::m_data and so on are really just 'void* A::m_userData'. You obviously want to hoist the field into (as last member in) A, but you also need to account for all inherited types' field offsets. That merging needs some thinking, specifically in the UI department; it needs to be atomic.

It becomes even more fun when you realize some B-E vfuncs actually belongs in A. Bonus points if you considered MS C++ and __declspect(novtable), where A have no vtable (in the binary). :-)

In addition, it's entirely possible (perhaps even likely in this scenario) that such a basetype field (void* userData;) is only ever accessed through inline methods casting it to the derived type. That's an instance you need to be able to specify "field is kindasorta covariant"; perhaps there are separate structs "A_data", "B_data"..., where those types are completely unrelated, yet you would want a way to describe/model "A_data belongs to A's userData" and "B_data belongs to B's userData". The description of such a relationship could be interesting. In context it's not even on the map, but I believe it should still be documented somwehere as "Requires design. Difficult. Post-Phd. :-)".

++luck;

ghidra007 commented 2 months ago

@tamlin-mike Thank you for the very well described issues regarding vftables and for the luck.