facebookarchive / BOLT

Binary Optimization and Layout Tool - A linux command-line utility used for optimizing performance of binaries
2.51k stars 177 forks source link

A Problem About .got Table Updates #301

Closed CcWeapon closed 1 year ago

CcWeapon commented 2 years ago

Hello. I found that when BOLT updates the got table, only the main entry and the first secondary entry of the function are updated. The .got tables of other secondary entries of the function are not updated. Why is that so? Is it correct that all symbols in the .got table should be updated?

Problem Analysis

When bolt updates .got, it will use function getBinaryFunctionAtAddress(). The return value is determined by BF and EntryID.

BinaryFunction *BinaryContext::getBinaryFunctionAtAddress(uint64_t Address) {
  // First, try to find a function starting at the given address. If the
  // function was folded, this will get us the original folded function if it
  // wasn't removed from the list, e.g. in non-relocation mode.
  auto BFI = BinaryFunctions.find(Address);
  if (BFI != BinaryFunctions.end())
    return &BFI->second;

  // We might have folded the function matching the object at the given
  // address. In such case, we look for a function matching the symbol
  // registered at the original address. The new function (the one that the
  // original was folded into) will hold the symbol.
  if (const BinaryData *BD = getBinaryDataAtAddress(Address)) {
    uint64_t EntryID = 0;
    BinaryFunction *BF = getFunctionForSymbol(BD->getSymbol(), &EntryID);
    if (BF && EntryID == 0)
      return BF;
  }
  return nullptr;
}

then in func getFunctionForSymbol(), EntryID is determined by getEntryIDForSymbol(). getEntryIDForSymbol() indicates that the return value is 0 only when the symbol is the primary entry of the function or the first secondary entry. This means that EntryID will not be 0, and the return value of getBinaryFunctionAtAddress will be nullptr. Symbols for other secondary entries will not be updated in the .got table

uint64_t BinaryFunction::getEntryIDForSymbol(const MCSymbol *Symbol) const {
  if (!isMultiEntry())
    return 0;

  for (const MCSymbol *FunctionSymbol : getSymbols())
    if (FunctionSymbol == Symbol)
      return 0;

  // Check all secondary entries available as either basic blocks or lables.
  uint64_t NumEntries = 0;
  for (const BinaryBasicBlock *BB : BasicBlocks) {
    MCSymbol *EntrySymbol = getSecondaryEntryPointSymbol(*BB);
    if (!EntrySymbol)
      continue;
    if (EntrySymbol == Symbol)
      return NumEntries;
    ++NumEntries;
  }
  NumEntries = 0;
  for (const std::pair<const uint32_t, MCSymbol *> &KV : Labels) {
    MCSymbol *EntrySymbol = getSecondaryEntryPointSymbol(KV.second);
    if (!EntrySymbol)
      continue;
    if (EntrySymbol == Symbol)
      return NumEntries;
    ++NumEntries;
  }

  llvm_unreachable("symbol not found");
}
maksfb commented 1 year ago

Sorry, missed that earlier. Yes, you are right that GOT will not get updated in this case. What application were you trying to optimize?

CcWeapon commented 1 year ago

We try to replace getBinaryFunctionAtAddress() with getBinaryFunctionContainingAddress() in func getNewFunctionAddress().

uint64_t RewriteInstance::getNewFunctionAddress(uint64_t OldAddress) {
  const BinaryFunction *Function = BC->getBinaryFunctionContainingAddress(OldAddress);
  if (!Function)
    return 0;

  assert(!Function->isFragment() && "cannot get new address for a fragment");

  auto MultientryOffset = BC->getMultientryOffsets().find(OldAddress);
  if (MultientryOffset == BC->getMultientryOffsets().end()) {
    return Function->getOutputAddress();
  } else {
    return Function->getOutputAddress() + MultientryOffset->second;
  }
}
maksfb commented 1 year ago

Sounds like you fixed it in your fork?

CcWeapon commented 1 year ago

Yes, func getBinaryFunctionContainingAddress() is used to solve the problem.