NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
51.12k stars 5.82k forks source link

ExtractClassInfoFromRTTIScript does not exist, but scripts require it to be run first. #3081

Closed Coriana closed 3 years ago

Coriana commented 3 years ago

Scripts GraphClassesScript.java and UpdateClassFunctionDataScript.java both reference running ExtractClassInfoFromRTTIScript before them, but it doesn't exist.

I don't have any other information to help resolve.

ghidra007 commented 3 years ago

Sorry the ExtractClassInfoFromRTTIScript was renamed to RecoverClassesFromRTTIScript just before releasing the beta and we forgot to update the name in the other script docs. Also FYI, the UpdateClassFunctionDataScript is getting reworked before the main release to be more complete. There are some things it is missing now.

astrelsky commented 3 years ago

Sorry the ExtractClassInfoFromRTTIScript was renamed to RecoverClassesFromRTTIScript just before releasing the beta and we forgot to update the name in the other script docs. Also FYI, the UpdateClassFunctionDataScript is getting reworked before the main release to be more complete. There are some things it is missing now.

Out of curiosity will scanning for gnu rtti be included? I tried the scripts on my InheritanceTests and it claimed there was no rtti.

ghidra007 commented 3 years ago

There has been just the beginnings of work done on it so it will be very iffy for awhile. Once 10.0 gets released we expect to put more time into the gcc portion of the rtti script. Thanks for pointing us to your test files.

astrelsky commented 3 years ago

There has been just the beginnings of work done on it so it will be very iffy for awhile. Once 10.0 gets released we expect to put more time into the gcc portion of the rtti script. Thanks for pointing us to your test files.

No problem. I am very familiar with how this works for gcc and clang. If anything isn't clear it should be fairly easy to reach me. I was going to open a discussion a few times but then backed out because it is clear that it's just not mature enough yet.

I have a version of those test classes somewhere that serializes the information that gets printed. I don't remember if it's to json or XML but if there is interest I'll put it in a separate branch.

Coriana commented 3 years ago

i have interest. I cannot guarantee I will get much use out of it, but I have a growing interest in RTTI and RTTI recovery.

I just realized you are the same person who does the Cpp-Class-Analyzer that i love using so much... thank you for that extension.

ghidra007 commented 3 years ago

Sorry the ExtractClassInfoFromRTTIScript was renamed to RecoverClassesFromRTTIScript just before releasing the beta and we forgot to update the name in the other script docs. Also FYI, the UpdateClassFunctionDataScript is getting reworked before the main release to be more complete. There are some things it is missing now.

Out of curiosity will scanning for gnu rtti be included? I tried the scripts on my InheritanceTests and it claimed there was no rtti.

After looking into why this was happening I realized that the arch64 programs with gcc are not listing gcc as the compiler idtype. The script does an initial isGcc check that was failing so it didn't run on those. The others do find rtti and fix up the vtable structures but none of the class recovery is happening because the current recovery method was determined based on red had compiled versions that have direct references in the vtables to the three special inheritance typeinfo structs. This isn't the case in the debian compiled tests programs you tested on. Thanks for pointing out your examples. As mentioned before, we are in very early stages of gcc rtti but plan to continue work on it. After 10.0 gets released it would be nice to open a discussion about gcc rtti since you are obviously very knowledgeable about it.

astrelsky commented 3 years ago

Sorry the ExtractClassInfoFromRTTIScript was renamed to RecoverClassesFromRTTIScript just before releasing the beta and we forgot to update the name in the other script docs. Also FYI, the UpdateClassFunctionDataScript is getting reworked before the main release to be more complete. There are some things it is missing now.

Out of curiosity will scanning for gnu rtti be included? I tried the scripts on my InheritanceTests and it claimed there was no rtti.

After looking into why this was happening I realized that the arch64 programs with gcc are not listing gcc as the compiler idtype. The script does an initial isGcc check that was failing so it didn't run on those. The others do find rtti and fix up the vtable structures but none of the class recovery is happening because the current recovery method was determined based on red had compiled versions that have direct references in the vtables to the three special inheritance typeinfo structs. This isn't the case in the debian compiled tests programs you tested on. Thanks for pointing out your examples. As mentioned before, we are in very early stages of gcc rtti but plan to continue work on it. After 10.0 gets released it would be nice to open a discussion about gcc rtti since you are obviously very knowledgeable about it.

That's actually quite strange. Would you mind compiling my samples for red hat that would have that layout? My current analyzers probably don't work for it if that is the case.

ghidra007 commented 3 years ago

Sorry the ExtractClassInfoFromRTTIScript was renamed to RecoverClassesFromRTTIScript just before releasing the beta and we forgot to update the name in the other script docs. Also FYI, the UpdateClassFunctionDataScript is getting reworked before the main release to be more complete. There are some things it is missing now.

Out of curiosity will scanning for gnu rtti be included? I tried the scripts on my InheritanceTests and it claimed there was no rtti.

After looking into why this was happening I realized that the arch64 programs with gcc are not listing gcc as the compiler idtype. The script does an initial isGcc check that was failing so it didn't run on those. The others do find rtti and fix up the vtable structures but none of the class recovery is happening because the current recovery method was determined based on red had compiled versions that have direct references in the vtables to the three special inheritance typeinfo structs. This isn't the case in the debian compiled tests programs you tested on. Thanks for pointing out your examples. As mentioned before, we are in very early stages of gcc rtti but plan to continue work on it. After 10.0 gets released it would be nice to open a discussion about gcc rtti since you are obviously very knowledgeable about it.

That's actually quite strange. Would you mind compiling my samples for red hat that would have that layout? My current analyzers probably don't work for it if that is the case.

I will ask someone on the team if they are able to do so. I'm guessing the differences are due to the different compiler flavors but am not sure. I am also curious to know for sure if that is the reason for differences. It could also be the gcc version used. Yours is a much newer one than the one used in the examples I was looking at. Also, the compiler options could have caused the differences.

astrelsky commented 3 years ago

Sorry the ExtractClassInfoFromRTTIScript was renamed to RecoverClassesFromRTTIScript just before releasing the beta and we forgot to update the name in the other script docs. Also FYI, the UpdateClassFunctionDataScript is getting reworked before the main release to be more complete. There are some things it is missing now.

Out of curiosity will scanning for gnu rtti be included? I tried the scripts on my InheritanceTests and it claimed there was no rtti.

After looking into why this was happening I realized that the arch64 programs with gcc are not listing gcc as the compiler idtype. The script does an initial isGcc check that was failing so it didn't run on those. The others do find rtti and fix up the vtable structures but none of the class recovery is happening because the current recovery method was determined based on red had compiled versions that have direct references in the vtables to the three special inheritance typeinfo structs. This isn't the case in the debian compiled tests programs you tested on. Thanks for pointing out your examples. As mentioned before, we are in very early stages of gcc rtti but plan to continue work on it. After 10.0 gets released it would be nice to open a discussion about gcc rtti since you are obviously very knowledgeable about it.

That's actually quite strange. Would you mind compiling my samples for red hat that would have that layout? My current analyzers probably don't work for it if that is the case.

I will ask someone on the team if they are able to do so. I'm guessing the differences are due to the different compiler flavors but am not sure. I am also curious to know for sure if that is the reason for differences. It could also be the gcc version used. Yours is a much newer one than the one used in the examples I was looking at. Also, the compiler options could have caused the differences.

Thank you. I'm not sure exactly how old of a compiler we are talking about here but it would be explained if it was pre gccv3. I could be mistaken but I think they were not following the Itanium abi until gccv3. That is also the reason for the mangling changes in v3.

ghidra007 commented 3 years ago

Sorry the ExtractClassInfoFromRTTIScript was renamed to RecoverClassesFromRTTIScript just before releasing the beta and we forgot to update the name in the other script docs. Also FYI, the UpdateClassFunctionDataScript is getting reworked before the main release to be more complete. There are some things it is missing now.

Out of curiosity will scanning for gnu rtti be included? I tried the scripts on my InheritanceTests and it claimed there was no rtti.

After looking into why this was happening I realized that the arch64 programs with gcc are not listing gcc as the compiler idtype. The script does an initial isGcc check that was failing so it didn't run on those. The others do find rtti and fix up the vtable structures but none of the class recovery is happening because the current recovery method was determined based on red had compiled versions that have direct references in the vtables to the three special inheritance typeinfo structs. This isn't the case in the debian compiled tests programs you tested on. Thanks for pointing out your examples. As mentioned before, we are in very early stages of gcc rtti but plan to continue work on it. After 10.0 gets released it would be nice to open a discussion about gcc rtti since you are obviously very knowledgeable about it.

That's actually quite strange. Would you mind compiling my samples for red hat that would have that layout? My current analyzers probably don't work for it if that is the case.

I will ask someone on the team if they are able to do so. I'm guessing the differences are due to the different compiler flavors but am not sure. I am also curious to know for sure if that is the reason for differences. It could also be the gcc version used. Yours is a much newer one than the one used in the examples I was looking at. Also, the compiler options could have caused the differences.

Thank you. I'm not sure exactly how old of a compiler we are talking about here but it would be explained if it was pre gccv3. I could be mistaken but I think they were not following the Itanium abi until gccv3. That is also the reason for the mangling changes in v3.

It is GCC: (GNU) 7.3.1 20180712 (Red Hat 7.3.1-10). What compile options did you use when compiling?

astrelsky commented 3 years ago

It is GCC: (GNU) 7.3.1 20180712 (Red Hat 7.3.1-10). What compile options did you use when compiling?

-Wall -gdwarf-3 -g3 -std=c++17

I just checked and GCC 7 should support c++17 but you might need to use -std=c++1z instead.

ghidra007 commented 3 years ago

It is GCC: (GNU) 7.3.1 20180712 (Red Hat 7.3.1-10). What compile options did you use when compiling?

-Wall -gdwarf-3 -g3 -std=c++17

I just checked and GCC 7 should support c++17 but you might need to use -std=c++1z instead.

The ones I am looking at (where my script run successfully) uses

-std=c++1 -Wall -O2 -Wno-sign-compare -nostdinc -m64

All of the vtables I see in m examples reference typeinfo structs that point to one of the three special typeinfo structs that indicate if they are no-inheritance, single-inheritance, or multi-and-or-virtual-inheritance. The binaries you produced point to external thunks instead. The docs i have read indicate that they will point to one of the typeinfo structs. Is this what you see when you analyze them? I'm starting to wonder if the importer or analyzer is getting the symbols wrong or something.

astrelsky commented 3 years ago

Each type_info is a class containing a virtual function in memory. Because of this the first member at offset 0 is the _vptr which will point to the corresponding type_info vtable (offset to point to the function table). So you scan for offcut references to the class type_info vtables.

When symbols aren't present and/or it is statistically linked the initial type_info classes can be located by searching for the mangled type name and then back tracking.

There is a lot of analysis that can be done where the documentation won't be helpful. Reconstruction of the classes with virtual inheritance for example is one them. I managed this by leveraging the fact that the offsets in the virtual base table and the listing of inherited classes in the __vmi_class_type_info must be sorted. I can explain that in detail in the future if need be. My point here is that I never found much of the documentation on rtti for the Itanium abi very helpful. Some things came from reading the gcc source and the rest came from good old fashioned reversing. That's the real reason I built the test classes. It was to figure out how the hell it really worked.

I have officially derailed this issue. ¯_(ツ)_/¯

ghidra007 commented 3 years ago

Each type_info is a class containing a virtual function in memory. Because of this the first member at offset 0 is the _vptr which will point to the corresponding type_info vtable (offset to point to the function table). So you scan for offcut references to the class type_info vtables.

When symbols aren't present and/or it is statistically linked the initial type_info classes can be located by searching for the mangled type name and then back tracking.

There is a lot of analysis that can be done where the documentation won't be helpful. Reconstruction of the classes with virtual inheritance for example is one them. I managed this by leveraging the fact that the offsets in the virtual base table and the listing of inherited classes in the __vmi_class_type_info must be sorted. I can explain that in detail in the future if need be. My point here is that I never found much of the documentation on rtti for the Itanium abi very helpful. Some things came from reading the gcc source and the rest came from good old fashioned reversing. That's the real reason I built the test classes. It was to figure out how the hell it really worked.

I have officially derailed this issue. ¯(ツ)

Yes I can find the vtables and typeinfo structs but when I do the pointer to the typeinfo that should be one of the special ones isn't -- it is a thunk function reference instead. I am familiar with backtracking using the mangled name as this is how the windows one is done. yes, I agree the documentation is not always helpful and looking at many types of examples is the best way to see what is really happening.

astrelsky commented 3 years ago

Yes I can find the vtables and typeinfo structs but when I do the pointer to the typeinfo that should be one of the special ones isn't -- it is a thunk function reference instead. I am familiar with backtracking using the mangled name as this is how the windows one is done. yes, I agree the documentation is not always helpful and looking at many types of examples is the best way to see what is really happening.

Oh yes I remember now. Check the relocation table. I had a conversation with with @ghidra1 about this. Have a read through the conversation here it should explain what is going on. When the mangled symbols for the vtables are present in the relocation table I ultimately end up using the relocation table to collect the potential rtti. There are a couple of gotchas with this. You can see exactly how I handle scanning for clang and gcc here.

ghidra007 commented 3 years ago

Yes I can find the vtables and typeinfo structs but when I do the pointer to the typeinfo that should be one of the special ones isn't -- it is a thunk function reference instead. I am familiar with backtracking using the mangled name as this is how the windows one is done. yes, I agree the documentation is not always helpful and looking at many types of examples is the best way to see what is really happening.

Oh yes I remember now. Check the relocation table. I had a conversation with with @ghidra1 about this. Have a read through the conversation here it should explain what is going on. When the mangled symbols for the vtables are present in the relocation table I ultimately end up using the relocation table to collect the potential rtti.

yes that is totally what is happening. Thanks for the info.

ghidra007 commented 3 years ago

Sorry the ExtractClassInfoFromRTTIScript was renamed to RecoverClassesFromRTTIScript just before releasing the beta and we forgot to update the name in the other script docs. Also FYI, the UpdateClassFunctionDataScript is getting reworked before the main release to be more complete. There are some things it is missing now.

Out of curiosity will scanning for gnu rtti be included? I tried the scripts on my InheritanceTests and it claimed there was no rtti.

After looking into why this was happening I realized that the arch64 programs with gcc are not listing gcc as the compiler idtype. The script does an initial isGcc check that was failing so it didn't run on those. The others do find rtti and fix up the vtable structures but none of the class recovery is happening because the current recovery method was determined based on red had compiled versions that have direct references in the vtables to the three special inheritance typeinfo structs. This isn't the case in the debian compiled tests programs you tested on. Thanks for pointing out your examples. As mentioned before, we are in very early stages of gcc rtti but plan to continue work on it. After 10.0 gets released it would be nice to open a discussion about gcc rtti since you are obviously very knowledgeable about it.

That's actually quite strange. Would you mind compiling my samples for red hat that would have that layout? My current analyzers probably don't work for it if that is the case.

I will ask someone on the team if they are able to do so. I'm guessing the differences are due to the different compiler flavors but am not sure. I am also curious to know for sure if that is the reason for differences. It could also be the gcc version used. Yours is a much newer one than the one used in the examples I was looking at. Also, the compiler options could have caused the differences.

To find an example built with the older Red Hat gcc, see the ghidra build (the one we put out on ghidra-sre.org) and use the decompile program located in ghidra_/Ghidra/Features/Decompiler/os/linux64.

astrelsky commented 3 years ago

Sorry the ExtractClassInfoFromRTTIScript was renamed to RecoverClassesFromRTTIScript just before releasing the beta and we forgot to update the name in the other script docs. Also FYI, the UpdateClassFunctionDataScript is getting reworked before the main release to be more complete. There are some things it is missing now.

Out of curiosity will scanning for gnu rtti be included? I tried the scripts on my InheritanceTests and it claimed there was no rtti.

After looking into why this was happening I realized that the arch64 programs with gcc are not listing gcc as the compiler idtype. The script does an initial isGcc check that was failing so it didn't run on those. The others do find rtti and fix up the vtable structures but none of the class recovery is happening because the current recovery method was determined based on red had compiled versions that have direct references in the vtables to the three special inheritance typeinfo structs. This isn't the case in the debian compiled tests programs you tested on. Thanks for pointing out your examples. As mentioned before, we are in very early stages of gcc rtti but plan to continue work on it. After 10.0 gets released it would be nice to open a discussion about gcc rtti since you are obviously very knowledgeable about it.

That's actually quite strange. Would you mind compiling my samples for red hat that would have that layout? My current analyzers probably don't work for it if that is the case.

I will ask someone on the team if they are able to do so. I'm guessing the differences are due to the different compiler flavors but am not sure. I am also curious to know for sure if that is the reason for differences. It could also be the gcc version used. Yours is a much newer one than the one used in the examples I was looking at. Also, the compiler options could have caused the differences.

To find an example built with the older Red Hat gcc, see the ghidra build (the one we put out on ghidra-sre.org) and use the decompile program located in ghidra_/Ghidra/Features/Decompiler/os/linux64.

Great thank you! My analyzer worked as expected.

ghidra007 commented 3 years ago

Sorry the ExtractClassInfoFromRTTIScript was renamed to RecoverClassesFromRTTIScript just before releasing the beta and we forgot to update the name in the other script docs. Also FYI, the UpdateClassFunctionDataScript is getting reworked before the main release to be more complete. There are some things it is missing now.

Out of curiosity will scanning for gnu rtti be included? I tried the scripts on my InheritanceTests and it claimed there was no rtti.

After looking into why this was happening I realized that the arch64 programs with gcc are not listing gcc as the compiler idtype. The script does an initial isGcc check that was failing so it didn't run on those. The others do find rtti and fix up the vtable structures but none of the class recovery is happening because the current recovery method was determined based on red had compiled versions that have direct references in the vtables to the three special inheritance typeinfo structs. This isn't the case in the debian compiled tests programs you tested on. Thanks for pointing out your examples. As mentioned before, we are in very early stages of gcc rtti but plan to continue work on it. After 10.0 gets released it would be nice to open a discussion about gcc rtti since you are obviously very knowledgeable about it.

That's actually quite strange. Would you mind compiling my samples for red hat that would have that layout? My current analyzers probably don't work for it if that is the case.

I will ask someone on the team if they are able to do so. I'm guessing the differences are due to the different compiler flavors but am not sure. I am also curious to know for sure if that is the reason for differences. It could also be the gcc version used. Yours is a much newer one than the one used in the examples I was looking at. Also, the compiler options could have caused the differences.

To find an example built with the older Red Hat gcc, see the ghidra build (the one we put out on ghidra-sre.org) and use the decompile program located in ghidra_/Ghidra/Features/Decompiler/os/linux64.

Great thank you!

You're welcome!