Ghidra's own OOP analyses in combination with Pharos + Kaiju do not work smoothly

0xBEEEF commented 3 years ago

Describe the bug When analyzing object-oriented programs, it would be helpful to be able to draw on the strength of Ghidra as well as Pharos + Kaiju. Unfortunately, the same does not work fully so far, because the rules used by Ghidra itself are different from those used when importing analyses with Kaiju. Here, an adaptation to the new Ghidra standard should definitely be made.

To Reproduce Use a simple program with a medium to large inheritance hierarchy. It is important that there is a lot of RTTI data. Can't provide an example on the fly unfortunately, so I'll try to describe it as best as I can. If I find time, I will post it.

In the first step, run the new RecoverClassesFromRTTIScript.java script provided with Ghidra. This will automatically create many classes, name functions, etc.

Then analyze the binary file with Ghidra. The default settings are sufficient.

Now try to import the file into Ghidra. Because the Ghidra own script RecoverClassesFromRTTIScript.java was already executed before, this can lead to some problems:

For example Ghidra marks class data in a separate struct with the name ClassName_data. This is not done by kaiju. The existing class is rebuilt by kaiju and the encapsulated data is discarded. Here it would be to continue using the structures already created by Ghidra. By the way, this also applies to function names and so on.

Expected behavior The above described applies generally to the creation or adaptation of existing classes. My suggestion here would be to follow the conventions that the script RecoverClassesFromRTTIScript.java adheres to. This would also have the charm that one could use other scripts provided by Ghidra such as ApplyClassFunctionDefinitionUpdatesScript.java and ApplyClassFunctionSignatureUpdatesScript.java in the further analysis.

Then one could proceed in the future as follows:

RecoverClassesFromRTTIScript.java
pharos
kaiju
manual post analysis

0xBEEEF commented 3 years ago

I have to refine the bug a little bit, because there might be some misunderstandings about the rules.

By rules in this case I don't mean the analysis rules used by Ghidra to determine classes based on RTTI data. Rather, it is about the structures that are created in the process. Ghidra's own structures are always split into ClassName_vftable or ClassName_data.

This is the main problem in my eyes. From my point of view, the structures generated by Ghidra are not properly reused in the JSON import of kaiju so far.

From my point of view it would be super handy if you could reuse them, and create new structures for classes not found during Ghidra's analysis using the same scheme. Splitting them into the appropriate components would also be handy.

Here again from the Ghidra source code the naming rules Ghidra follows when creating new structures.

public static final String DTM_CLASS_DATA_FOLDER_NAME = "ClassDataTypes";
private static final String CLASS_DATA_STRUCT_NAME = "_data";
private static final String DEFAULT_VFUNCTION_PREFIX = "vfunction";
private static final String VFUNCTION_COMMENT = "virtual function #";
private static final String CLASS_VFUNCTION_STRUCT_NAME = "_vftable";

public static final String VFTABLE_LABEL = "vftable";
private static final String VBASE_DESTRUCTOR_LABEL = "vbase_destructor";
private static final String VBTABLE_LABEL = "vbtable";
private static final String CLONE_LABEL = "clone";
private static final String DELETING_DESTRUCTOR_LABEL = "deleting_destructor";

sei-eschwartz commented 3 years ago

Is it fair to summarize this issue as: OOAnalyzer OOP importer does not follow the same convention as Ghidra's built-in analyses?

Changing the names to match the Ghidra convention should not be hard, of course, and seems like a good idea. Are there more changes required beyond that?

0xBEEEF commented 2 years ago

I have to come back here now. Has implementation already begun? The request was made several months ago, and I haven't heard anything since then. However, I still consider the topic to be important.

Is it fair to summarize this issue as: OOAnalyzer OOP importer does not follow the same convention as Ghidra's built-in analyses?

Changing the names to match the Ghidra convention should not be hard, of course, and seems like a good idea. Are there more changes required beyond that?

To the question above, I can only say that it is not only names but also substructures that are important from my point of view. A separate structure is created per subclass. These in turn consist of other subclasses. On the deepest inheritance level there is only the VTable and possible data. From my point of view, everything is composed of these.

This shows that it is therefore not only a matter of names, but the entire structure is also to be considered.

sei-eschwartz commented 2 years ago

I apologize for the delay in responding.

I am still unclear on what changes are needed besides the names. The OOAnalyzer importer should already reference the proper type for inherited or embedded objects. Here is an example data type that inherits from two base classes:

So unless I misunderstand, the importer already has the behavior you describe with respect to substructures. Am I missing something?

CERTCC / kaiju

Ghidra's own OOP analyses in combination with Pharos + Kaiju do not work smoothly #4