NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
52.21k stars 5.91k forks source link

Declare structure element as std::string #305

Open hochl opened 5 years ago

hochl commented 5 years ago

Is your feature request related to a problem? Please describe. I have a structure that is defined in C++ like so:

struct Person { std::string name; int age; };

I have tried to setup the structure using the "Edit Data Type..." and setting individual fields, but it does not work for std::string as expected. I can select `basic_string' as the first member type, but it is using a size of one and the type field says "WARNING: Empty Structure ... placeholder Structure". I could not find any information of how to make Ghidra recognize a std::string, and Google is astonishingly silent about that also.

Describe the solution you'd like Selecting std::string, or, ideally, Ghidra recognizing it automatically. Ghidra has the information, actually, and could make use of it, because at some point in the decompiler is the following statement:

void __thiscall Person(Person this, ...) { basic_string((basic_string )this); ... }

So it is known that the first element in Person can be used in a call to basic_string(), so it could automatically detect this structure member.

Describe alternatives you've considered I re-defined basic_string as the following structure:

struct basic_string { char* data; size_t size; uint8_t dummy[16]; };

accordings to the output of "g++ -E". But that is not really the same as automatic detection.

Additional context Suggestions:

Note: I found someone who has a similar problem and almost the same example. That is pretty fascinating: https://reverseengineering.stackexchange.com/questions/20819/ghidra-define-c-string?rq=1

ghidra1 commented 5 years ago

Ghidra does not formally support C++ constructs such as templates and classes and we currently must rely on the use of namespaces and structure data types to approximate some of the object-oriented concepts. There are no built-in Ghidra data types which correspond to template/class definitions such as basic_string. Currently the structure-to-class association is very loose and based upon name and namespace hierarchy. There is no referential integrity at this point in time and the class structure is not formally defined as such. We are aware of the challenges these limitations can present.

Ghidra has established a "Class" namespace which can support instance methods which employ the _thiscall calling convention, however a separate data structure definition with the same name must be used to implement an instance structure, similar to what you did with basic_string. In some cases the decompiler can be used to help fill-out such structures, although is not done automatically at present.

In many cases the demangler can help us identify the existence of a class and thus its' corresponding "Placeholder" structure, however it does not provide any details about the structure definition which is left for follow-on structure fill-out work. In general, a class structure should be placed within a "category" hierarchy which matches the class namespace hierarchy, although the containing category is not constrained. As you may surmise, the structure to class match-up is subject to name collisions in the absence of a formal relationship.

In the case of basic_string (as a class) I would create a basic_string structure definition within an "std" category and fill-out as you have done. In addition, a basic_string Class namespace should be defined within the std namespace.

It is possible that the Class may already be defined as "external" and associated with one or more Libraries. It has become apparent that our implementation of "external" namespaces can be problematic and lead to multiple definitions of a namespace or Class. Hopefully this and the other object-oriented will be addressed in the future.

hochl commented 5 years ago

Ok, I was mislead by the fact that there is a Classes submenu.

Anyways, as a first step towards real support for classes / templates, it would be nice if the work you have outlined in your reply above could be done by Ghidra, maybe as an option or tool, for all the used STL classes. You could right-click and have a menu entry add code for STL class basic_string or similar and have the best Ghidra can already do to add all the stuff. It could be an option for the analyzer too, like "scan for STL classes and add boilerplate code".

theKidOfArcrania commented 5 years ago

hey so i'm the op on the reverseengineering question :wink:

iirc I think ghidra has builtin support for many of the structs defined in C, i.e. under generic_clib. I think it would be nice, if these can extended to STL structs (or at least ones that are pretty much defined compile-time). Of course, the same cannot be said for template STL's such as vector.

hochl commented 5 years ago

hey so i'm the op on the reverseengineering question wink

iirc I think ghidra has builtin support for many of the structs defined in C, i.e. under generic_clib. I think it would be nice, if these can extended to STL structs (or at least ones that are pretty much defined compile-time). Of course, the same cannot be said for template STL's such as vector.

It would still be possible to detect the layout of the underlying structure for types like std::vector< int > dynamically, I don't see a reason why not. Of course this makes some logic on the side of Ghidra necessary, maybe a guided process by the user.