GrammaTech / gtirb

Intermediate Representation for Binary analysis and transformation
https://grammatech.github.io/gtirb/
Other
305 stars 36 forks source link

Changing section name for functions #52

Closed SaiVK closed 2 years ago

SaiVK commented 2 years ago

Hello Everyone Is there any way to associate a function to a different section? For now, the function attributes do not have a section name, based on my understanding. Can anyone provide any pointers?

Thanking you Sai

tjohnson-gt commented 2 years ago

Yes, functions are not directly associated with sections, themselves. However, the CodeBlocks that comprise a function are contained in ByteIntervals, and ByteIntervals are contained in Sections. So to move a function from one Section to another, I think what you would do would be to go find all the ByteIntervals containing the CodeBlocks of the function of interest and move them from the Section they're in to the Section you want them to be in. (In C++, for example, you would use Section::removeByteInterval() and Section::addByteInterval())

Note that there are some caveats in terms of doing this:

1) You probably want to make sure the ByteIntervals exactly match the CodeBlocks. If they're bigger, they may include more than what you want to move. There's no requirement that ByteIntervals be the same size as the CodeBlocks/DataBlocks they contain. This allows us to support things like overlapping CodeBlocks. Some tools may even do degenerate things like just generate a single ByteInterval covering a whole Section at a time. It is possible to split ByteIntervals into smaller pieces if need be - just be aware that CodeBlocks and DataBlocks shouldn't span more than one ByteInterval at a time.

2) You may want to think about addresses when doing this. Sections are generally considered to be contiguous. So having a function in the middle of the address range for a section belong to a different section is probably going to lead to problems at some point. You can directly update the addresses as you do the move, or drop the addresses for now and re-introduce them later down the road if needed. GTIRB supports having ByteIntervals be "floating" with no address assignment. In C++, you can call ByteInterval::setAddress() with std::nullopt to get a ByteInterval without an address.

3) I'm not aware of anyone trying to do this before, so you could run into gotchas I'm not thinking about...

SaiVK commented 2 years ago

Thanks, @tjohnson-gt. My main goal is to assign place functions in different sections and change the placement of functions in the final binary image using a custom linker script. When the source code is available, I implemented this using -ffunction-sections compiler flag. But this is not compatible with assembly files.

In the meanwhile, I tried a workaround, where I directly edited the asm file emitted by gtirb-pprinter and I was able to assign sections for individual functions and get a modified executable, where the placement of sections change based on custom linker script. Is there any case where this workaround would fail? One scenario that I could think of is the call instructions to a function might break if the functions are too far away.

Thanking you Sai

tjohnson-gt commented 2 years ago

Certainly distance could be a factor - especially depending on where the sections get loaded (ie. if they're not juxtaposed contiguously.) However (assuming you're talking about x64?) the smallest argument you can give a call instruction is 16 bits. You'd have to have a very large dispersal for that to be a problem. This is probably the common case.

You could, however, have a tail call that's implemented with a "short" jmp instruction w/ an 8-bit operand for the relative target. That could be a problem if something gets moved too far away. So you might scan for short jmp instructions that target a CodeBlock that's outside the function containing the instruction. Though I would expect this to be rare.

Another issue you could have: function boundary identification is imperfect. If the GTIRB IR has a poorly identified function boundary somewhere, you could end up having weird effects. I'm not sure offhand if ddisasm might do this, but for example, some disassemblers have been known to go wonky when there's a chunk of code that appears to look like a multi-entry function. That is, think of something that looks like this:

entry_1:  mov ebp, esp   // Called by func_1()
          ...            // Fallthrough to entry_2
entry_2:  ...            // Called by func_2()
          ...

Some disassemblers might treat entry_1 as a separate function from entry_2. Thus your scheme would split up entry_1 and entry_2 into separate sections. But since there's fallthrough control flow from entry_1 to entry_2, how does that play out? You would need to make sure the two sections end up juxtaposed or something.

This type of thing doesn't happen all that often, but sometimes it can happen as a result of the disassembler being confused about function boundaries somewhere else and it cascades to this type of thing. However, there is/used to be actual examples of this kind of multi-entry function in the old Win32 C standard libraries.

Again, I'm not sure whether this situation may/may not show up w/ ddisasm (assuming you're using that.) But it's something to be aware of - fallthrough control-flow between functions.

SaiVK commented 2 years ago

Thank you so much @tjohnson-gt for the comprehensive feedback. For now, I am assuming non-stripped binaries and cases where fall-through functions are not present. Another query I got now is, does ddisasm makes use of function symbols in non-stripped binaries? And the function identification kicks in only when stripped binaries are provided?

Thanking you Sai

tjohnson-gt commented 2 years ago

Yes, ddisasm should leverage symbols if present. Function identification still comes into play, though, in terms of delineating the extent of each function. A symbol only indicates a function's entry point, not all the blocks that belong to it.

SaiVK commented 2 years ago

Oh okay, I understand now. Thank you so much @tjohnson-gt.