NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
49.06k stars 5.65k forks source link

Sibling anonymous namespaces are incorrectly consolidated in MSVC programs #6661

Open Tkain opened 6 days ago

Tkain commented 6 days ago

Describe the bug When Ghidra reconstructs namespaces within MSVC programs, it gives each anonymous namespace the name of ​`anonymous namespace' (or something similar) within its parent namespace. Because each anonymous namespace is given the same name (within its namespace), when a namespace contains multiple anonymous namespaces, Ghidra incorrectly combines them into one anonymous namespace. If both anonymous namespaces contain classes of the same name, these may be incorrectly combined as well.

To Reproduce Steps to reproduce the behavior:

  1. Create a new project and import the attached example program (with the recommended settings).
  2. Open the CodeBrowser on the newly-imported program.
  3. Analyze the program with default settings. (You may get some warnings; these are not relevant to this issue AFAIK.)
  4. In the symbol tree, navigate to Namespaces/​`anonymous_namespace'/Example.
  5. Observe that the recovered Example class has two RTTI type descriptors. Both type descriptors contain different strings, indicating that they belong to different classes, but are placed by Ghidra under the same class.

Expected behavior Both anonymous namespaces and their Example classes would be separate from one another.

Attachments See this ZIP file for an example program that can be analyzed to reproduce the issue alongside its source code files.

Environment:

ghizard commented 4 days ago

Thanks for submitting this issue. I've seen this too and worked on part of it, but it slipped my mind to change the analyzer. My plan is to just use the underlying anonymous namespace in the mangled symbol (e.g., A0x987654321). The demangled string used for plate comments would still show the `anonymous namespace' string, as that is what other demanglers, such as undname, show for the namespace and this helps with testing our underlying demangling model against them.

ghizard commented 3 days ago

After digging in a little further, I think the issue here is another manifestation of what I had been seeing with the Demangler, but I think yours is something specific to PDB. MSFT is outputting the `anonymous namespace' component of the namespace for both say_hello methods. I'd have to dig into this further to see if I can figure out if I can find the explicit owner namespace/class, and if so, see if it has the explicit underlying anonymous notation (A0x987654321) in a mangled string. If the mangled string is available, I'd prefer that so we have a universal answer whether coming from Demangler or PDB... if not, it could be possible to change the name to something that reflects which namespace/class. This is an unknown amount of work and might come along slowly with other work in the pipeline that needs similar attention.

Edit: Then again, you mention the RTTI... but this information all comes from the PDB and there is a host of class-related PDB mechanisms that could come into play... so still PDB for now, as it reveals the truth for with or without PDB.