astrelsky / Ghidra-Cpp-Class-Analyzer

Ghidra C++ Class and Run Time Type Information Analyzer
MIT License
633 stars 46 forks source link

InvalidDataTypeException when parsing std classes #7

Closed v-p-b closed 4 years ago

v-p-b commented 4 years ago

I try to analyze a VS C++ binary with Windows C++ Class Analyzer and get this exception:

ghidra.program.model.data.InvalidDataTypeException: Invalid ClassTypeInfo
    at ghidra.app.plugin.prototype.CppCodeAnalyzerPlugin.wrappers.RttiModelWrapper.validate(RttiModelWrapper.java:154)
    at ghidra.app.plugin.prototype.CppCodeAnalyzerPlugin.wrappers.RttiModelWrapper.getParentModels(RttiModelWrapper.java:222)
    at ghidra.app.cmd.data.rtti.gcc.ClassTypeInfoUtils.sortByMostDerived(ClassTypeInfoUtils.java:328)
    at ghidra.app.plugin.prototype.CppCodeAnalyzerPlugin.AbstractCppClassAnalyzer.analyzeVftables(AbstractCppClassAnalyzer.java:298)
    at ghidra.app.plugin.prototype.CppCodeAnalyzerPlugin.windows.WindowsCppClassAnalyzer.analyzeVftables(WindowsCppClassAnalyzer.java:99)
    at ghidra.app.plugin.prototype.CppCodeAnalyzerPlugin.AbstractCppClassAnalyzer.added(AbstractCppClassAnalyzer.java:102)
    at ghidra.app.plugin.core.analysis.OneShotAnalysisCommand.applyTo(OneShotAnalysisCommand.java:47)
    at ghidra.app.plugin.core.analysis.AutoAnalysisManager$AnalysisTaskWrapper.run(AutoAnalysisManager.java:685)
    at ghidra.app.plugin.core.analysis.AutoAnalysisManager.startAnalysis(AutoAnalysisManager.java:785)
    at ghidra.app.plugin.core.analysis.AutoAnalysisManager.startAnalysis(AutoAnalysisManager.java:664)
    at ghidra.app.plugin.core.analysis.AutoAnalysisManager.startAnalysis(AutoAnalysisManager.java:629)
    at ghidra.app.plugin.core.analysis.AnalysisBackgroundCommand.applyTo(AnalysisBackgroundCommand.java:62)
    at ghidra.framework.plugintool.mgr.BackgroundCommandTask.run(BackgroundCommandTask.java:101)
    at ghidra.framework.plugintool.mgr.ToolTaskManager.run(ToolTaskManager.java:315)
    at java.base/java.lang.Thread.run(Thread.java:834)

The exception is generated by the extension, so this should be some valid signaling, not some programming error. Although all members of the RttiModelWrapper object are null (so the causing program structures are not easy to identify), this problem can be in connection with previous "Input model invalid" messages generated by the RttiModelWrapper constructor when parsing std::basic_ifstream and std::basic_ofstream class information.

I assume this can be handled by catching the exception in the calling function (getParentModels()) but I'm not yet familiar with the code so I wouldn't immediately create a PR for this.

astrelsky commented 4 years ago

I see. Basically RttiModelWrapper failed to construct properly. Unfortunately my Java and OO programming is self taught and I learned throughout this project.

How would you recommend going about signaling such a case?

Also do you know what library and version is being used in the binary? I've actually had quite some difficulty tracking down the c++ standard library for vs. If a class properly constructs but its parent class doesn't then the fault lies somewhere in my code.

Edit: It appears I'm swallowing the exception. the following should make it append the exception to the log also.

https://github.com/astrelsky/Ghidra-Cpp-Class-Analyzer/blob/3433dfbff20aee72424e73b6f755e436dca97ccb/src/main/java/ghidra/app/plugin/prototype/CppCodeAnalyzerPlugin/wrappers/RttiModelWrapper.java#L74

Msg.error(this, "Input model invalid", e);

v-p-b commented 4 years ago

As I understand the calling function is manipulating multiple objects, so I think the cleanest way would be just to catch any exceptions and avoid adding violating objects to the returned structure, like this:

https://github.com/v-p-b/Ghidra-Cpp-Class-Analyzer/commit/964f03ddaa13a9a468c4fe41be8f84953fb07c00

Let me know if I should open a PR with this!

This way uninitialized objects should be grafeully discarded instead of blocking the whole analysis process.

Unfortunately I don't know much about the target binary - it would be nice if this plugin can help with reversing it :) With the above change I get some good looking results, but also a new (seemingly unrelated) exception that I will report separately.

astrelsky commented 4 years ago

As I understand the calling function is manipulating multiple objects, so I think the cleanest way would be just to catch any exceptions and avoid adding violating objects to the returned structure, like this:

v-p-b@964f03d

Let me know if I should open a PR with this!

This way uninitialized objects should be grafeully discarded instead of blocking the whole analysis process.

Unfortunately I don't know much about the target binary - it would be nice if this plugin can help with reversing it :) With the above change I get some good looking results, but also a new (seemingly unrelated) exception that I will report separately.

I see. So would it be preferred to add something like "missing super_..." to the structures description like the dwarf importer does when all the base classes that should be there aren't?

Is this a binary which you could safely share?

v-p-b commented 4 years ago

For my use-cases such markings are not needed: successfully typed data will show, while missed ones won't.

I will share the binary privately.

astrelsky commented 4 years ago

I just took care of a bunch of exceptions due to my own carelessness. There are still some cases of unresolved inheritance and I definitely need to refactor the RttiModelWrapper.

astrelsky commented 4 years ago

b3b4fde0aa592ff475eb19482a1c24d69f446019 should take care of this. There are still a few edge cases where the base offset cannot be determined. I don't think they would be accessible in the source either so I don't think it's a big deal.

v-p-b commented 4 years ago

Thanks, looking good!