Open Wall-AF opened 2 years ago
If all the above isn't possible, could we at least have the ability to copy'n'paste (with overwrite capability) between structures? (It is possible to "Create Structure From Selection"!) And maybe an additional field to add a trackable parent structures name???
Has there been any progress made on this? Or is it still just a suggestion?
Hi @ryanmkurtz, is this (or something better) going to be considered anytime soon? I'm beginning to need it more and more!
I would like to propose an alternative implementation for this feature request.
The idea is the addition of a "Derived Structure" data type, which is essentially a regular structure with an existing structure prepended to it, plus a set of changes to the existing structure. This is how I'm envisioning it:
In the Data Type Manager,
when right-clicking on a folder, under "New", the option "Derived Structure" additionally shows. Once clicked, a dialog like in "New -> Create TypeDef" would appear for selecting the source data type and name of the new structure. Then, the structure editor opens.
when right-clicking a structure, under "New", the option "Derived from \<structure name>" additionally shows. Once clicked, the structure editor opens.
In the Structure Editor, if a derived structure is being edited:
All components of the source type are displayed in the component table and highlighted in color depending on whether they were changed.
The components of the source type cannot be renamed or deleted, only the type can be changed.
New components can only be added after the end of the original structure.
Internally, a derived structure would simply be a small extension to the regular structure class; namely the addition of a reference to the source data type and an array of (<component name>, <new data type>)
tuples. I'm not familiar with the internals of how datatypes are passed between the the GUI and the decompiler, but I imagine a derived structure could easily be able to generate a "normal" structure instance that gets passed forwards to where it's needed.
One tricky situation is when the source data type of a derived structure changes:
A new edge case is of note here: If the derived structure has an additional component field
and the original structure has a component fd
, this component is not allowed to be renamed to field
, because it would collide with the component in the derived structure, so some additional checking would need to be done there.
I have included some mockups of how this feature could look like:
Original structure BaseClass<T>
:
Derived structure 1 BaseClass<int>
, derived from BaseClass<T>
(orange = changed from source data type, green = unchanged):
Derived structure 2 SubClass<int>
, derived from BaseClass<int>
:
@RootCubed nice!
Thank you for the suggestions. We are aware of the many limitations of Ghidra's current modeling of classes and are working on a better model. It is a very hard problem that will require a lot of changes to the Ghidra framework which is why it is taking so long.
In my RTTI scripts I have temporarily solved the vftablePtr issue in the structures by splitting out the parent structure members inside the class structure instead of simply adding the parent structure as a member. This isn't ideal but the correct vftablePtr can then overwrite its associated parent one. To save some of the knowledge of what belongs to what parent class I create
Regarding copying from a current structure and making a new structure, that feature exists right now. You just select rows in a current structure (within the structure editor) and right mouse and choose Create Structure From Selection.
@ghidra007 could you give us a guestimate of the amount of time you wondeful guys are able to allocate say in a month to your solution and a rough guide as to how much work you think it may take?
Glad to hear it's being worked on!
I agree that separating the data from the vftable is usually a good enough workaround. However, there is at least one compiler/ABI I know of (CodeWarrior) that does not always place the vftable at offset 0, but rather at the point where the first virtual method is declared (i.e. there might be some member variables before the vftable). This can make it quite annoying because then you basically have
struct cls {
cls__data0 data0;
cls__vftable vt;
cls__data1 data1;
}
and once you come across a subclass of such a class, it quickly becomes a nightmare to work with :') Therefore, being able to create these inherited structures would be incredibly useful, not only for templated classes. I hope this use case can be considered for the reworked structure model.
Yes we are aware there are use cases where the first vftable offset is not at 0. There are cases where there are multiple vftable offsets associated with multiple parents. There are cases where one vftable is associated with the derived class and the other is associated with the parent. We hope to handle all the cases with whatever model we come up with. You are right it is a nightmare to deal with. It is a very hard problem to model correctly.
@ghidra007 could you give us a guestimate of the amount of time you wondeful guys are able to allocate say in a month to your solution and a rough guide as to how much work you think it may take?
I'm sorry but it is against project policy to share internal project planning, timelines, roadmaps, etc. However we are actively working on class modeling solutions that we hope will solve your current issues.
@ghidra007 could you give us a guestimate of the amount of time you wondeful guys are able to allocate say in a month to your solution and a rough guide as to how much work you think it may take?
I'm sorry but it is against project policy to share internal project planning, timelines, roadmaps, etc. However we are actively working on class modeling solutions that we hope will solve your current issues.
Thanks for the work you do anyway. Just wish there were better docs so us mere mortals could help in a more direct way...
As this is a problem I've been fighting with for literally decades in "the other" serious R.E. tool, and I am still appalled at the mere thought of creating and/or extending vtables for subclasses, I figured I might as well share some insights hopefully useful.
For the sake of sanity, I only refer to single inheritance here.
this
parameter/argument.I don't know what came first, the chicken or the egg, but the inheritance-specifying relationship between two C++ data types with virtual functions, let's call them A (inheriting nothing) and B (inheriting A) is the exact same inheritance as A::v(f?)table to B::v(f?)table, only that the v(f?)table types are:
this
method argument.As for the concrete data types A and B, the vtable pointer, whether it's placed first (which btw is a requirement for Windows ABI - COM is based on this) or placed willy-nilly as apparently Codewarrior does it, still adhere to these rules.
Sometimes (very infrequently IME, where I define "infrequently" in this context to be less than one in ten thousand) you need to see spelled out "is this m_foo
part of this type, or is it held by a parent type). An easy cop-out could be to have only a global flag "Always qualify parent member field access". While that setting needs to be present anyway (really, it does), this could be an "annotation" flag to turn on/off for
But the most important parts, without question, is the ability to state
and the obvious; Ghidra needs to know both what a vtable and the concept of covariance is. :-)
Now for the less obvious problems: When you rev. a type, frequently you don't initially know its exact size. You may start reving type B, only to later find out "Damn, it's inheriting from type A", but by that time you have already crated a large amount of data members (fields) where e.g. the first half really belongs to A, and the latter half to B itself. This requires functionality to atomically move (leading) data from B to A, and set B as inhering A.
That's in context easy.
The less obvious part comes when you also have types C, D and E, all inheriting A, and you realize "Oh, these fields at same offset after the vfptr called B::m_data, C::m_data and so on are really just 'void* A::m_userData'. You obviously want to hoist the field into (as last member in) A, but you also need to account for all inherited types' field offsets. That merging needs some thinking, specifically in the UI department; it needs to be atomic.
It becomes even more fun when you realize some B-E vfuncs actually belongs in A. Bonus points if you considered MS C++ and __declspect(novtable), where A have no vtable (in the binary). :-)
In addition, it's entirely possible (perhaps even likely in this scenario) that such a basetype field (void* userData;) is only ever accessed through inline methods casting it to the derived type. That's an instance you need to be able to specify "field is kindasorta covariant"; perhaps there are separate structs "A_data", "B_data"..., where those types are completely unrelated, yet you would want a way to describe/model "A_data belongs to A's userData" and "B_data belongs to B's userData". The description of such a relationship could be interesting. In context it's not even on the map, but I believe it should still be documented somwehere as "Requires design. Difficult. Post-Phd. :-)".
++luck;
@tamlin-mike Thank you for the very well described issues regarding vftables and for the luck.
Is your feature request related to a problem? Please describe. Keeping track and modifying inherited structures is difficult (not really supported) in the current situation.
Describe the solution you'd like There are three new ideas for this which should not interfere with current usage for the languages if left blank: 1) add two new fields to the structure editor; one for the parent structure name and one for the number of fields at the beginning of the structure to be ignored by the parent. 2) when editing the structure, ensure that parent fields that are modified by the user are modified in the parent structure 3) when renaming the structure, either in the "Structure Editor" or in the "Data Type Manager" rename the current classname if one exists.
Describe alternatives you've considered The alternative which I have been using is to insert the detail in the comment section of the Structure Editor. However, when manually analysing an application, structure names are not always know when substructures are created, and if you have several, their management becomes overcomplicated.
Additional context In item 1 above, the reason for having the second field is so that things like "vtables" (which are usually stored first in the structure) can be overridden by the user.