facebookarchive / BOLT

Binary Optimization and Layout Tool - A linux command-line utility used for optimizing performance of binaries
2.51k stars 177 forks source link

Injected functions & AssignSections pass & PatchEntries pass #276

Open ghost opened 2 years ago

ghost commented 2 years ago

Hello there, I've been working on original code (which goes to .bolt.org.text section) modifications, in particular - replacing local function calls made via PLT with direct calls, but it's not that matters (yes, these things happen sometimes).

In my case, I had to create 1-instruction injected BinaryFunctions at the places I want to replace call target as far as we cannot modify original code functions body entirely. And I had to do it in RewriteInstance scope for some reason. The solution was based on PatchEntries pass code for its simplicity.

And the issue I faced was: at every targeted call instruction offset in produced binary there were blocks of X equal replaced call instructions, where X is the total amount of injected functions BOLT made in process of PLT calls replacing, so the sample code:

  f76660:   34000fe0    cbz w0, f7685c <_Z19Online_UpdateModuleP16PROCESS_INSTANCE@@Base+0x2a4>
  f76664:   97ef626b    bl  b4f010 <_Z28Online_IsNetworkAbortPendingv@plt>
  f76668:   b94206c8    ldr w8, [x22, #516]
  f7666c:   2a000108    orr w8, w8, w0
  f76670:   35000f68    cbnz    w8, f7685c <_Z19Online_UpdateModuleP16PROCESS_INSTANCE@@Base+0x2a4>

became the following:

  f76660:   34000fe0    cbz w0, f7685c <_Z25Online_IsUserAbortPendingv@@Base+0x2a8>
  f76664:   94e29464    bl  481b7f4 <_Z20Lockstep_HasDivergedP19LOCKSTEP_DIVERGENCE@@Base+0x1c>
  f76668:   94e29463    bl  481b7f4 <_Z20Lockstep_HasDivergedP19LOCKSTEP_DIVERGENCE@@Base+0x1c>
  f7666c:   94e29462    bl  481b7f4 <_Z20Lockstep_HasDivergedP19LOCKSTEP_DIVERGENCE@@Base+0x1c>
  f76670:   35000f68    cbnz    w8, f7685c <_Z25Online_IsUserAbortPendingv@@Base+0x2a8>

in case of requested replacing of total 3 PLT calls (global binary scope). The left 2 places are the same - 3 copies of replaced calls (all with unique opcodes).

Here, the replaced call should be only at offset f76664.

The root of this issue is: BOLT has AssignSections pass, which assigns sections for the functions, Injected functions as well - they go to .text.injectedsection as, I believe, is made 'by design' of Injected functions variant.

Later, RewriteInstance::mapCodeSections() maps the functions to their sections (output is added manually):

Setting function SYM1.PLT.org.1/ ImageSize to section .text.injected, .text.injected size 12
Setting function SYM2.PLT.org.2/ ImageSize to section .text.injected, .text.injected size 12
Setting function SYM3.PLT.org.3/ ImageSize to section .text.injected, .text.injected size 12

And this is the point - .text.injected section size is computed of sum size of all injected functions that assigned to it.

But! This does not concern injected functions created by PatchEntries pass as it runs after AssignSections pass!

Setting function SYM1.org.0/ ImageSize to section .local.text.SYM1.org.0/, .local.text.SYM1.org.0/ size 4
Setting function SYM2.org.0/ ImageSize to section .local.text.SYM1.org.0/, .local.text.SYM2.org.0/ size 4
Setting function SYM3.org.0/ ImageSize to section .local.text.SYM1.org.0/, .local.text.SYM3.org.0/ size 4

As I see, injected trampolines made by PatchEntries pass are assigned to local function sections, which are unique, and, thus, are resolved by size 4 (1 instruction).

To succeed my purposes, I had to add a special property to BinaryFunction class, and check for it in AssignSections::runOnFunctions() to avoid assigning .text.injected sections to those functions.

And finally, what I have to suggest here is to do one of the following: a) Somehow document main idea of Injected BinaryFunction design, because from the point I see - they really used only in PatchEntries pass, but from the start they are meant to emit into separated binary section (in what case they are really not doing that); b) somehow document the requirement of layout of every pass relative to the AssignSections pass. With FinalizeFunctions pass we have at least the comment about it should be running very last, same warning about ReorderFunctions pass etc. c) Using PatchEntries pass without any comment about avoiding functions it creates being 'legacy-assigned' to separate injected section also looks like undocumented hack, because (at least) I had to spend not 1 hour inspecting the unique behaviour of these functions before I finally figured this out.

Consider this as a code documentation suggestion, its not looking like a bug, but its definitely some non-obvious usage of injected functions, which 'ought' to emit in one way, but 'could' be used the other.

rafaelauler commented 2 years ago

Thanks for writing this, I appreciate your suggestions. I will take a look on how to translate this into better documentation for AssignSections and injected functions. By the way, we moved our repo to the upstream LLVM repo and this repo here is deprecated. We are tracking https://discourse.llvm.org/c/subprojects/bolt/68 for BOLT-related discussions. But I'm not sure if discourse is a good way to replace the github issues of facebookincubator repo. Any ideas, @maksfb ?

maksfb commented 2 years ago

@alexmechanic, you can file new issues at https://github.com/llvm/llvm-project using BOLT tag. Did you look at PLTCall pass? It sounds similar to what you are trying to accomplish. Modifications are required for AArch64, though.

ghost commented 2 years ago

@maksfb thanks for the tip about issuing new topics, it’s been a while since I forked this project to improve. Yes, I took a look at the PLTCall pass, but I have a different goal. Pass replaces PLT calls with GOT calls, and I need to fix-replace local calls made via PLT with direct calls.