facebookarchive / BOLT

Binary Optimization and Layout Tool - A linux command-line utility used for optimizing performance of binaries
2.51k stars 177 forks source link

How to use --hugify option? #313

Closed PeterYang12 closed 1 year ago

PeterYang12 commented 1 year ago

According to options, hugify could put hot code on 2MB page(s) (hugify) at runtime automatically. I see it uses transparent hugepage. But i didn't see any use of AnonHugePages after applying hugify.

My bolt commands with hugify are as follows:

llvm-bolt -hugify ./test_pie -o ./test_pie.hugify -data=./test_pie.fdata

My SUT is Ubuntu22.04 and the kernel version is 5.15.0-56-generic.

PeterYang12 commented 1 year ago

By the way, THP is enabled:

~$ cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never
PeterYang12 commented 1 year ago

I didn't see any info about hugify. This is the bolt info:

root@a1f2173a05fe:/home/bolt# llvm-bolt -hugify ./test_pie -o ./test_pie.hugify2 -data=./test_pie.fdata                                         BOLT-INFO: shared object or position-independent executable detected
BOLT-INFO: Target architecture: x86_64BOLT-INFO: BOLT version: 142aa1bdd1dd1db9a7fecf9d157228019c794c94
BOLT-INFO: first alloc address is 0x0
BOLT-INFO: creating new program header table at address 0x400000, offset 0x400000
BOLT-WARNING: debug info will be stripped from the binary. Use -update-debug-sections to keep it.BOLT-INFO: enabling relocation modeBOLT-INFO: enabling lite modeBOLT-INFO: pre-processing profile using branch profile readerBOLT-INFO: 7 out of 11 functions in the binary (63.6%) have non-empty execution profile
BOLT-INFO: 2 functions with profile could not be optimized
BOLT-INFO: profile for 1 objects was ignored
BOLT-INFO: 0 instructions were shortened
BOLT-INFO: removed 2 empty blocksBOLT-INFO: UCE removed 0 blocks and 0 bytes of code.                                                                                                    talling 0 byteBOLT-INFO: SCTC: patched 0 tail calls (0 forward) tail calls (0 backward) from a total of 0 while removing 0 double jumps and removing 0 basic blocks totalling 0 bytes of code. CTCs total execution count is 0 and the number of times CTCs are taken is 0.
BOLT-INFO: setting _end to 0x800d18
BOLT-INFO: setting __hot_start to 0x600000
BOLT-INFO: setting __hot_end to 0x60022b
BOLT-INFO: patched build-id (flipped last bit)
maksfb commented 1 year ago

It takes some time for the kernel to find and allocate huge pages for the process. If your test runs quickly, it may never get a chance to take advantage of it. Even then, I find THPs to be quite unreliable. The only guaranteed way to get them is to reserve with the kernel.

PeterYang12 commented 1 year ago

Thank you for your quick response. Does that mean hugify doesn't work temporarily? Or do you have any successfull case I can reproduce?

treapster commented 1 year ago

It is likely that MADV_HUGEPAGE is ignored by the kernel if the config option READ_ONLY_THP_FOR_FS is not set, and you have to recompile kernel with this option. We're currently investigating it but as Max said it's not super reliable, at least on 5.10+ kernels.

PeterYang12 commented 1 year ago

Got it. So only kernel config option READ_ONLY_THP_FOR_FS is set to kernel, then hugify option will take effect? Or MADV_HUGEPAGE will be ignored. Am I right? In conslusion, hugify option hasn't been ready, especially on 5.10+ kernels? Do you have any plan on this option. I think it's quite useful and really want to apply it to my work.

PeterYang12 commented 1 year ago

By the way, from source code, I saw this:

  // Mark the hot code page to be huge page.
  if (__madvise(From, Size, 14 /* MADV_HUGEPAGE */) == -1) {
    char Msg[] = "[hugify] setting MADV_HUGEPAGE is failed\n";
    reportError(Msg, sizeof(Msg));
  }

Do I have any chance to report the error? I am not very familiar with bolt source codes. Would be very appreciated if you could help me.

treapster commented 1 year ago

Do I have any chance to report the error? I am not very familiar with bolt source codes. Would be very appreciated if you could help me.

The problem is that it's silently ignored without errors.

Also, you can change __bolt_hugify_self_impl to always call hugifyForOldKernel() unconditionally and rebuild BOLT, and it'll likely make HugePages work.

yavtuk commented 1 year ago

Hello @PeterYang12, here we see two ways which you can use to solve the problem: 1) It is reconfigure the kernel with READ_ONLY_THP_FOR_FS option and it should work without changes inside bolt; 2) you can use the way which we implemented for 4.18 kernel, need to call hugify_for_old_kernel. It will allow you check bolt without kernel reconfiguration. https://github.com/facebookincubator/BOLT/blob/main/bolt/runtime/hugify.cpp#L106

// if (!has_pagecache_thp_support()) { hugify_for_old_kernel(from, to); return; // }

PeterYang12 commented 1 year ago

Hello @PeterYang12, here we see two ways which you can use to solve the problem:

  1. It is reconfigure the kernel with READ_ONLY_THP_FOR_FS option and it should work without changes inside bolt;
  2. you can use the way which we implemented for 4.18 kernel, need to call hugify_for_old_kernel. It will allow you check bolt without kernel reconfiguration. https://github.com/facebookincubator/BOLT/blob/main/bolt/runtime/hugify.cpp#L106

// if (!has_pagecache_thp_support()) { hugify_for_old_kernel(from, to); return; // }

Thank you so much for your two's solution. I'll try the second one.

maksfb commented 1 year ago

Yes, at the moment unconditional call to hugify_for_old_kernel() is the way to go. We should make it the default for BOLT and have another option for THP. Unfortunately, I don't see any other way around it. Please submit a patch if you can.

yavtuk commented 1 year ago

@maksfb let’s wait for Peter’s checking, If it works on his side, I’ll make a patch

PeterYang12 commented 1 year ago

Sorry for the late update. It works for me. Thank you guys. But unfortunately, only part of my codes can be mapped into hugepage due to 2M address boundary.

7ffff53ed000-7ffff5400000   r-xp 00400000 00:36 19810928                     /opt/pkb/git/hhvm-perf/opcache-bolt.so
7ffff5400000-7ffff5600000   r-xp 00000000 00:00 0
7ffff5600000-7ffff5800000 r-xp 00613000   00:36 19810928                     /opt/pkb/git/hhvm-perf/opcache-bolt.so

Another question, will the patch be sent to llvm-bolt or this repo?

yavtuk commented 1 year ago

I will prepare the patch

PeterYang12 commented 1 year ago

I will prepare the patch

The patch will be sent to llvm-bolt? Is this repo still under maintenance?

yavtuk commented 1 year ago

@PeterYang12 MR is here https://reviews.llvm.org/D141659

"Is this repo still under maintenance?" No

PeterYang12 commented 1 year ago

Another question here. From llvm-bolt description, hugify will put hot code on 2MB page(s) (hugify) at runtime.

$ readelf -lW php-fpm-bolt

Elf file type is DYN (Position-Independent Executable file)
Entry point 0x1cf7fc0
There are 12 program headers, starting at offset 25165824

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x1800000 0x0000000001800000 0x0000000001800000 0x0002a0 0x0002a0 R   0x8
  INTERP         0x0002a8 0x00000000000002a8 0x00000000000002a8 0x00001c 0x00001c R   0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x000000 0x0000000000000000 0x0000000000000000 0x0f8e70 0x0f8e70 R   0x200000
  LOAD           0x200000 0x0000000000200000 0x0000000000200000 0x484a5d 0x484a5d R E 0x200000
  LOAD           0x800000 0x0000000000800000 0x0000000000800000 0x83a420 0x83a420 R   0x200000
  LOAD           0x115fe08 0x000000000135fe08 0x000000000135fe08 0x0a3ad0 0x0c3f08 RW  0x200000
  LOAD           0x1800000 0x0000000001800000 0x0000000001800000 0x50e400 0x50e400 R E 0x200000
  DYNAMIC        0x11fe800 0x00000000013fe800 0x00000000013fe800 0x000260 0x000260 RW  0x8
  NOTE           0x0002c4 0x00000000000002c4 0x00000000000002c4 0x000044 0x000044 R   0x4
  GNU_EH_FRAME   0x1cf80bc 0x0000000001cf80bc 0x0000000001cf80bc 0x016344 0x016344 R   0x4
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0x10
  GNU_RELRO      0x115fe08 0x000000000135fe08 0x000000000135fe08 0x0a01f8 0x0a01f8 R   0x1

 Section to Segment mapping:
  Segment Sections...
   00
   01     .interp
   02     .interp .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt
   03     .init .plt .plt.got .bolt.org.text .fini
   04     .bolt.org.rodata .bolt.org.eh_frame_hdr .bolt.org.eh_frame
   05     .init_array .fini_array .data.rel.ro .dynamic .got .data .bss
   06     .text .text.cold .text.injected .eh_frame .rodata .rodata.cold .text.bolt.extra.1 .eh_frame.bolt.extra.1 .eh_frame_hdr
   07     .dynamic
   08     .note.gnu.build-id .note.ABI-tag
   09     .eh_frame_hdr
   10
   11     .init_array .fini_array .data.rel.ro .dynamic .got

So only the .text section will be mapped into large pages, other sections like .text.cold .text.injected won't be, though they are in the same segment with .text, right?

yota9 commented 1 year ago

So only the .text section will be mapped into large pages

Yes, that is the idea

PeterYang12 commented 1 year ago

Thank you~

PeterYang12 commented 1 year ago

Close this issue since my question is answered. Thank you so much!