NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
51.13k stars 5.82k forks source link

Symbol Tree context menu actions to create/edit a structure for a class #3372

Closed benstone closed 1 year ago

benstone commented 3 years ago

Is your feature request related to a problem? Please describe.

When I'm reverse engineering C++ programs I found that I was often creating a class namespace and then creating a structure for the class. To do this I have to go down to the data type manager, right-click, create a new structure, and type the name of the class again.

creating-class-structure-before gif

Describe the solution you'd like

I'd like to suggest some extra context menu actions for the symbol tree that can automatically create a structure for a class, and a context menu that can jump to a class structure if it is already defined. The Create Class Structure action would create a new structure with the same name as the class and open the structure for editing. This reduces the number of clicks to create the structure, and means I don't have to retype or copy/paste the class name.

create-class-structure-action

The second action, Edit Class Structure, would only display if a structure with the same name as the class exists. This would open the structure for editing.

edit-class-structure

I've written a prototype of this feature and found it saved me some time while creating new classes. I'd be happy to raise a merge request with my implementation if you think this would be a good feature to have.

Describe alternatives you've considered

I thought about updating the existing Create Class action to automatically create the structure at the same time. This would be a bit faster but I figured you may not always want to automatically create the structure.

Additional context

Sorry about all of the other Github issues I raised before: I was running an automated UI test on my machine and it stole the keyboard focus and started sending keys into my browser. Could you please delete all of the duplicate issues? (#3361 to #3371)

Nemoumbra commented 1 year ago

Well, I don't suppose this is moving anywhere. I've just stumbled upon some RTTI tables which were automatically parsed by Ghidra resulting in the following group of classes appearing. image But does it matter if Ghidra still does this: image

Not to mention the fact that I'd like to see blockDevice->ReadBlock(16, (u8*)&local_a28); instead of (**(code **)(*(longlong *)blockDevice + 0x10))(blockDevice,0x10,&local_a28)

ghidra007 commented 1 year ago

Thanks for your suggestions. We already have it on our list to redesign how to handle classes including how to manage class namespaces and class structures. The problem now, as I'm sure you know, is that they are not tied together and there can be more than one structure in the data type manager representing a particular class structure. The PDB and DWARF analyzers will often be able to lay down mangled names at a location and if those have class structure information sometimes empty structure are created by the demangler if there is no further PDB or DWARF information to fill in those structures. Since the class namespace and structure are not tied together, the decompiler doesn't always choose the preferred one. You can create class structures using the decompiler which will create both the class and the structure at the same time but you will get an odd class name and renaming them is also not easy since they are not tied together.

Just an FYI since you mentioned that the RTTI Analyzer found classes. There is a very prototype script called the RecoverClassesFromRTTI.java which will use RTTI and other program information to try and figure out other class information including class structures and it sets an option used by the decompiler for the "Preferred Root Namespace Category" so that it will use structures found in that root category if there is one. If your program has classes that were created with RTTI I suggest you try running it as it will create class structures for you. If you do have other structures that are empty they were created by either the pdb analyzer, the dwarf analyzer or the demangler. The script attempts to use any known debug info and also tries to use other information to create class structures.

We are not sure what the GUI will look like when we finally make our permanent (not prototype script) changes but it is our goal to have the two tied together and more easily manipulated/renamed/etc... There is a lot of work to be done first with how Ghidra models classes and much more.

Your suggestion to automatically create a class structure (if there isn't already one) when creating a class in the symbol table could possibly be implemented in the meantime if the user has to choose where to put the structure in the data type manager and maybe that action could set the preferred root category or ask the user if they want to do that. We will discuss as a team and determine whether we want to do this before the new design overhaul is implemented.

Nemoumbra commented 1 year ago
...
RecoverClassesFromRTTIScript.java> Identified 2 classes to process and 2 class member functions to assign.
...
RecoverClassesFromRTTIScript.java> Total number of constructors: 2
...
RecoverClassesFromRTTIScript.java> Total number of destructors: 3
...

This is new: image

The Demangler types are the ones that are currently applied in the Decompiler. Do I have to manully retype the signatures of the functions that use BlockDevice or FileBlockDevice now? Cause they're still the old data types.

ghidra007 commented 1 year ago

Unless it is forced, the decompiler keeps the old ones. If the script assigns a function to a class then it gets the new class structure assigned. If a function is already in a class the script doesn't always know to reassign it. This is on my list of things to improve. Yes, for now you will have to retype the this variable in the signature in order to change it making sure to manually add the * for the pointer when retyping.

ghidra007 commented 1 year ago

Unless it is forced, the decompiler keeps the old ones. If the script assigns a function to a class then it gets the new class structure assigned. If a function is already in a class the script doesn't always know to reassign it. This is on my list of things to improve. Yes, for now you will have to retype the this variable in the signature in order to change it making sure to manually add the * for the pointer when retyping.

I have a fix in for this which should be pushed out in the next few days hopefully.

ghidra007 commented 1 year ago

Unless it is forced, the decompiler keeps the old ones. If the script assigns a function to a class then it gets the new class structure assigned. If a function is already in a class the script doesn't always know to reassign it. This is on my list of things to improve. Yes, for now you will have to retype the this variable in the signature in order to change it making sure to manually add the * for the pointer when retyping.

I have a fix in for this which should be pushed out in the next few days hopefully.

The fix is in for this issue where the decompiler was using the other class structures.

ghidra007 commented 1 year ago

Thanks for your suggestions. We already have it on our list to redesign how to handle classes including how to manage class namespaces and class structures. The problem now, as I'm sure you know, is that they are not tied together and there can be more than one structure in the data type manager representing a particular class structure. The PDB and DWARF analyzers will often be able to lay down mangled names at a location and if those have class structure information sometimes empty structure are created by the demangler if there is no further PDB or DWARF information to fill in those structures. Since the class namespace and structure are not tied together, the decompiler doesn't always choose the preferred one. You can create class structures using the decompiler which will create both the class and the structure at the same time but you will get an odd class name and renaming them is also not easy since they are not tied together.

Just an FYI since you mentioned that the RTTI Analyzer found classes. There is a very prototype script called the RecoverClassesFromRTTI.java which will use RTTI and other program information to try and figure out other class information including class structures and it sets an option used by the decompiler for the "Preferred Root Namespace Category" so that it will use structures found in that root category if there is one. If your program has classes that were created with RTTI I suggest you try running it as it will create class structures for you. If you do have other structures that are empty they were created by either the pdb analyzer, the dwarf analyzer or the demangler. The script attempts to use any known debug info and also tries to use other information to create class structures.

We are not sure what the GUI will look like when we finally make our permanent (not prototype script) changes but it is our goal to have the two tied together and more easily manipulated/renamed/etc... There is a lot of work to be done first with how Ghidra models classes and much more.

Your suggestion to automatically create a class structure (if there isn't already one) when creating a class in the symbol table could possibly be implemented in the meantime if the user has to choose where to put the structure in the data type manager and maybe that action could set the preferred root category or ask the user if they want to do that. We will discuss as a team and determine whether we want to do this before the new design overhaul is implemented.

The team decided not to add this feature since it would just be a stopgap.

Nemoumbra commented 1 year ago

The team decided not to add this feature since it would just be a stopgap.

This is a valid reason <=> the final version of the feature is close to being ready. Somehow I'm under the impression that it may take a few major releases (maybe 10.6), but I'd be happy to be disproven.

In any case, can somebody ping this issue once it's implemented?