facebookarchive / BOLT

Binary Optimization and Layout Tool - A linux command-line utility used for optimizing performance of binaries
2.51k stars 177 forks source link

Why does bolt add a new .text at the end of the binary file instead of reusing the existing .text? #290

Closed CcWeapon closed 2 years ago

CcWeapon commented 2 years ago

I found that the binaries doubled in size after bolt optimization. The reason is that the existing .text is discarded and a new .text is generated at the end.

old : image

after bolt : image

question:

1.Why is the existing .text not reused? Reusing will save more mem. 2.As I understand it, this is to simplify the processing of binaries?

cc @rafaelauler @maksfb @aaupov

thx!!

rafaelauler commented 2 years ago

In general BOLT doesn't make a hard effort to save disk space (how much space a binary occupies in disk). BOLT targets performance by clustering hot code together and thus making the dynamic profile of the application use memory addresses that are closer together. It's all about how the processor sees the code and less about how it looks on disk. From the processor perspective, it looks as if the code is denser/smaller, and all hardware structures that use addresses to index information will be better utilized: process will use less pages, less cache lines, and it will branch less, using less branch target entries and so on.

But if for some reason it is important for you to re-use the old .text section, you can. You can try it with the -use-old-text flag. It's not always possible, though. If the new .text is larger in size, we can't reuse the space allocated for the old .text section.

If you are using -lite flag (lite mode), BOLT will leave all cold functions in the old .text section and the new .text will be quite small, containing only hot code. That's why you will end up with two .text sections.

See also https://github.com/facebookincubator/BOLT/issues/93 and https://github.com/facebookincubator/BOLT/issues/242 for a related discussion that might be helpful for you.

CcWeapon commented 2 years ago

thx! I have other question. If -lite is used as you said, will the distance between .text and .cold be too far and the optimization effect deteriorate?

rafaelauler commented 2 years ago

For the purposes of this discussion, let's suppose you're working with 4KB pages and your program has 10s of MB in size. The distance from hot .text to cold .text in this case doesn't matter because hot and cold will likely live in separate pages anyway. You could argue that you may start to lose performance if the program starts to map in cold pages all the time because of inaccurate profile (a hot function was put in the cold area). So what happens is that these cold pages will start to compete with hot pages for space in the iTLB. Reducing the distance, in this case, won't help you, as a 4KB page won't cover both hot and cold parts of your program. You would mitigate this problem by fixing the profile and making sure hot functions are in the same region of memory, increasing the likelihood that they will share the same page.

Also, BOLT will usually never optimize or care too much about cold code anyway. It's not worth it to try to optimize code that has no profile as it will not have any real impact. That's why -lite mode will not even fully disassemble/build CFG for cold functions.

CcWeapon commented 2 years ago

For the purposes of this discussion, let's suppose you're working with 4KB pages and your program has 10s of MB in size. The distance from hot .text to cold .text in this case doesn't matter because hot and cold will likely live in separate pages anyway. You could argue that you may start to lose performance if the program starts to map in cold pages all the time because of inaccurate profile (a hot function was put in the cold area). So what happens is that these cold pages will start to compete with hot pages for space in the iTLB. Reducing the distance, in this case, won't help you, as a 4KB page won't cover both hot and cold parts of your program. You would mitigate this problem by fixing the profile and making sure hot functions are in the same region of memory, increasing the likelihood that they will share the same page.

Also, BOLT will usually never optimize or care too much about cold code anyway. It's not worth it to try to optimize code that has no profile as it will not have any real impact. That's why -lite mode will not even fully disassemble/build CFG for cold functions.

Thanks, it's been very helpful!