Closed CcWeapon closed 2 years ago
In general BOLT doesn't make a hard effort to save disk space (how much space a binary occupies in disk). BOLT targets performance by clustering hot code together and thus making the dynamic profile of the application use memory addresses that are closer together. It's all about how the processor sees the code and less about how it looks on disk. From the processor perspective, it looks as if the code is denser/smaller, and all hardware structures that use addresses to index information will be better utilized: process will use less pages, less cache lines, and it will branch less, using less branch target entries and so on.
But if for some reason it is important for you to re-use the old .text section, you can. You can try it with the -use-old-text flag. It's not always possible, though. If the new .text is larger in size, we can't reuse the space allocated for the old .text section.
If you are using -lite flag (lite mode), BOLT will leave all cold functions in the old .text section and the new .text will be quite small, containing only hot code. That's why you will end up with two .text sections.
See also https://github.com/facebookincubator/BOLT/issues/93 and https://github.com/facebookincubator/BOLT/issues/242 for a related discussion that might be helpful for you.
thx! I have other question. If -lite is used as you said, will the distance between .text and .cold be too far and the optimization effect deteriorate?
For the purposes of this discussion, let's suppose you're working with 4KB pages and your program has 10s of MB in size. The distance from hot .text to cold .text in this case doesn't matter because hot and cold will likely live in separate pages anyway. You could argue that you may start to lose performance if the program starts to map in cold pages all the time because of inaccurate profile (a hot function was put in the cold area). So what happens is that these cold pages will start to compete with hot pages for space in the iTLB. Reducing the distance, in this case, won't help you, as a 4KB page won't cover both hot and cold parts of your program. You would mitigate this problem by fixing the profile and making sure hot functions are in the same region of memory, increasing the likelihood that they will share the same page.
Also, BOLT will usually never optimize or care too much about cold code anyway. It's not worth it to try to optimize code that has no profile as it will not have any real impact. That's why -lite mode will not even fully disassemble/build CFG for cold functions.
For the purposes of this discussion, let's suppose you're working with 4KB pages and your program has 10s of MB in size. The distance from hot .text to cold .text in this case doesn't matter because hot and cold will likely live in separate pages anyway. You could argue that you may start to lose performance if the program starts to map in cold pages all the time because of inaccurate profile (a hot function was put in the cold area). So what happens is that these cold pages will start to compete with hot pages for space in the iTLB. Reducing the distance, in this case, won't help you, as a 4KB page won't cover both hot and cold parts of your program. You would mitigate this problem by fixing the profile and making sure hot functions are in the same region of memory, increasing the likelihood that they will share the same page.
Also, BOLT will usually never optimize or care too much about cold code anyway. It's not worth it to try to optimize code that has no profile as it will not have any real impact. That's why -lite mode will not even fully disassemble/build CFG for cold functions.
Thanks, it's been very helpful!
I found that the binaries doubled in size after bolt optimization. The reason is that the existing .text is discarded and a new .text is generated at the end.
old :
after bolt :
question:
1.Why is the existing .text not reused? Reusing will save more mem. 2.As I understand it, this is to simplify the processing of binaries?
cc @rafaelauler @maksfb @aaupov
thx!!