NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
51.13k stars 5.82k forks source link

infinite loop in RecoverClassesFromRTTIScript on a windows PE with multiple inheritance #5815

Closed picryott closed 1 year ago

picryott commented 1 year ago

Describe the bug Infinite loop while running the script RecoverClassesFromRTTIScript on a windows PE (Visual Studio 6). The monitor window is displayed for hours (until cancelled).

To Reproduce Run the java script "RecoverClassesFromRTTIScript" on a Visual Studio 6 PE that contains classes with multiple inheritance and a parent that has no inheritance. I can't provide an executable, but it should be easy to reproduce.

Expected behavior The script should end in success or throw an exception.

Screenshots None

Attachments None

Environment (please complete the following information):

Additional context I think there is an incorrect increment in RTTIWindowsClassRecoverer.java/getClassHierarchyFromRTTI(List recoveredClasses)

List<RecoveredClass> classHierarchy = recoveredClass.getClassHierarchy();
int index = 1;
while (index < classHierarchy.size()) {
    monitor.checkCancelled();
    RecoveredClass parentClass = classHierarchy.get(index);
    List<RecoveredClass> parentClassHierarchy = parentClass.getClassHierarchy();
    recoveredClass.addClassHierarchyMapping(parentClass, parentClassHierarchy);
    updateClassWithParent(parentClass, recoveredClass);
    index += parentClassHierarchy.size();
}

In my case parentClassHierarchy.size() can be 0, so the loop is infinite. parentClassHierarchy.size() can also be much bigger than classHierarchy.size(). So I think index increment should always be 1. I didn't try it yet.

ghidra007 commented 1 year ago

The class hierarchy is supposed to always contain the class itself as the first item so should always have at least one item. Obviously something changed and this is no longer happening in all cases. Thanks for finding this issue. I'll let you know once it is fixed.

picryott commented 1 year ago

Thank you. I just read the disclaimer for the script and I'm not in the best conditions for it to run.

My project is not fresh. I've been working on it for a long time and I have made many changes that may have broken something. So that could explain why a class hierarchy does not contain the class itself.

I ran the script on a fresh project. There was no infinite loop, but there was no information in the classes except the RTTI_Type_Descriptor found during the initial analysis.

And forget about my comment 'parentClassHierarchy.size() can also be much bigger than classHierarchy.size().'. I dumped the classhierarchies and if there is one more element for each parent, it should be good. I'm still trying to figure out how RTTI works.

ghidra007 commented 1 year ago

I'm wondering if this is happening because the RTTI is not the same format in this older version. Do you see any RTTI structures created after the RTTI analyzer is run (not the script but the analyzer before you run the script).

picryott commented 1 year ago

On a fresh project, after the RTTI analyzer is run, I have classes that contains RTTI_Type_Descriptor. But that's all. In the Symbol table there are only RTTI_Type_Descriptors.

The problem may be caused by the RTTI1Model structure as stated in #1790 The RTTI1Model structures in my PE do not have ClassHierarchyDescriptor (Visual Studio 6).

I have made my own script to find the RTTI structures as there was no script at that time. I have forced some data to be RTTI*Model but it was a long time ago and I can't tell you exactly how I did it. It may conflict with what the analyzer or script expect to find. As far as I know, my RTTI structures are at the correct addresses but as the size of RTTI1Model is not the same, they may overlap with some other data and become incoherent.

ghidra007 commented 1 year ago

Yes. It sounds like you have the old style RTTIs. I should update the script to identify that case, if possible, and not run. I think we have it on our list to update the RTTI analyzer to work with the old style RTTI but as it is so old it keeps getting bumped by more pressing things.

ghidra007 commented 1 year ago

The script is expecting the following RTTI data types to be present: RTTI_Base_Class_Array (array of RTTIBaseClassDescriptors - this is what the class hierarchy is built from - first entry is always current class's RTTIBaseClassDescriptor) RTTIBaseClassDescriptor RTTIClassHierarchyDescriptor RTTICompleteObjectLocator

Each should be created and labeled and put in appropriate class namespace by the analyzer.

If I remember correctly the script tries to fix up missing info where it can but the older cases are not handled.

ghidra007 commented 1 year ago

I added code to throw exceptions if it hits the case where class hierarchy is empty.

picryott commented 1 year ago

Thanks for all. I'll give it a try.

ghidra007 commented 1 year ago

Appreciate you trying it!