GrammaTech / ddisasm

A fast and accurate disassembler
https://grammatech.github.io/ddisasm/
GNU Affero General Public License v3.0
645 stars 60 forks source link

Use PDB files in rewriting process #74

Open avncharlie opened 5 months ago

avncharlie commented 5 months ago

I'm not sure where the best place to put this is between gtirb-pprinter, gtirb-rewriting and here, so please let me know and I can reopen this in the best repo.

While developing instrumentation using gtirb-rewriting, I would like to do this:

I haven't found any way to do this (run instrumentation that preserves symbol information in the output PE binary), is this possible?

aeflores commented 5 months ago

Hi @avncharlie , this is an interesting idea!

Our tooling is missing a few pieces for this to be possible. You could run ddisasm on a PE binary and create a gtirb, but we don't have any utilities to parse PDBs and use their information. This could be done (1) as a post-processing step where you annotate the gtirb with information from the PDB, or (2) have ddisasm parse the PDB so it can use it for better disassembly. Option 1 would probably be simpler to implement, but ddisasm would not benefit from the PDB information. Option 2 would probably require more work but could potentially get you better results.

Once you have a gtirb annotated with symbols, I think you should be able to use gtirb-rewriting to instrument it and gtirb-pprinter to generate a new PE. However, gtirb-pprinter cannot generate PDB files, that would be the second missing piece. I am not sure how much effort this would be. I know llvm's support for PDB files (e.g. https://llvm.org/docs/CommandGuide/llvm-pdbutil.html) has been getting better, so using some of that might make things easier.

XVilka commented 5 months ago

You could use the Rizin library for parsing both PDB and DWARF (and maybe some other debugging information in the future):

It is a C library and definitely smaller than LLVM, so using it is much easier.

aeflores commented 3 weeks ago

It looks like the latest version of LIEF https://lief.re/doc/stable/changelog.html#july-23th-2024 has some support for parsing PDBs and DWARF sections. Once we update Ddisasm to the latest Lief, using that information during disassembly should be much easier.