GrammaTech / ddisasm

A fast and accurate disassembler
https://grammatech.github.io/ddisasm/
GNU Affero General Public License v3.0
667 stars 62 forks source link

address in binary -> address in gtirb #13

Closed ceesb closed 4 years ago

ceesb commented 4 years ago

I realize this is more of a question / feature request.

I'm using the gtirb produced by ddisasm to perform some rewriting. The offsets of the instructions I'm rewriting are created by another tool, and the offsets are relative to the addresses of sections in the original ELF. I couldn't find this mapping information in the gtirb file, and I noticed that offsets of CodeBlocks in the gtirb do no match either. The only trick I found that allows me to map offsets in the original binary to the gtirb is by using the Function offsets since even if the function name is not retained, it is named with ".L_XXX" where XXX is an offset in the original binary. It's a bit cumbersome to keep track of the mapping like this, and it fails for code that's not in a function. Is there another way? Ideally CodeBlock would have an "originaloffset" set to the offset in the binary it was lifted from.

iconmaster5326 commented 4 years ago

The .L_XXX symbols (a) encode the virtual address in their name, not their physical one, and (b) are arbitrary identifiers that shouldn't be used to assume address. If what you want is the virtual address of blocks/sections, you can get this information using the getAddress method in the C++ API and the address field in the Python API.

iconmaster5326 commented 4 years ago

To do what you want to do, I recommend getting the address of your block, then the address of the section that that block belongs to, and subtracting those two numbers. That should produce the offset of the block from the beginning of the section.

ceesb commented 4 years ago

Thanks for the response, and sorry for the confusion, I'm looking to map the ELF's virtual addresses to the addresses in the GTIRB, I don't care about file offsets.

I'm unable to get these ELF addresses out of ByteInterval / ByteBlock / Section address fields.

To show you what I mean I attached a python script and sample binary (+src +gtirb) that uses the gtirb python API to print addresses and offsets. The sample binary has somefun1 at 0x5fa and somefun2 at 0x605 (virtual addresses). I find that the addresses are None (assuming because ddisasm found that it can be relocated anywhere).

For me the script prints:

$ python addressprint.py somefuns.gtirb 
...
fname somefun1
entry block offset in byte_interval 266
entry block address None
section address of entry block None
byte_interval address of contained entry block None

fname somefun2
entry block offset in byte_interval 277
entry block address None
section address of entry block None
byte_interval address of contained entry block None
iconmaster5326 commented 4 years ago

Normally, ddisasm produces sections so that they all have addresses; it is worrying to see a section without one. Is this GTIRB file straight from ddisasm? If the file has been modified, especially by adding new byte intervals without addresses, you may lose the address the section starts at.

This is because we calculate the address of sections as the address of the earliest byte interval in a section. However, if even one byte interval has no address, we can no longer be, for complete certain, what address the section is at.

If the GTIRB file has been modified, you might be able to get the right address for a section by doing what we do: iterating through the section's byte intervals and finding the one with the lowest address. If it hasn't been modified, then there's an actual issue with ddisasm that I will look into.

ceesb commented 4 years ago

I didn't modify the gtirb, it's straight out of ddisasm.

iconmaster5326 commented 4 years ago

Looking at it a bit, what version/commit of gtirb-capstone are you using? We recently fixed an oversight where using it would clear all addresses from all byte intervals.

ceesb commented 4 years ago

my gtirb-capstone is at 2bc8f12ece17fef75be2a66ef9f024b7881e7127

iconmaster5326 commented 4 years ago

Ah yes, that commit is affected by the bug. If you update to the latest commit of gtirb-capstone, you should find that things have addresses again.

iconmaster5326 commented 4 years ago

More specifically, try commit 619495a668d11e2806b7277d673d09f21abbc45a. I know as a fact that should work.

ceesb commented 4 years ago

Yes, 619495a668d11e2806b7277d673d09f21abbc45a works, but tip doesn't.

iconmaster5326 commented 4 years ago

It looks like someone reverted the fix to the bug. I am going to have to investigate. But I'm glad I could find a solution for you! I'll see about getting the fix un-reverted.

ceesb commented 4 years ago

Indeed, thx!