Closed pgoodman closed 9 years ago
For those unaware, the current state of things is that indirect edge template code immediately follows the basic block. Here's the gist of things:
block
|
in_edge ----.-> go_to_granary
| | |
compare_target --' <-----'
|
exit_to_block
In the below disassembly snippet, I've labelled the various parts. It should be pretty noticeable that this is quite heavy weight, and represents "garbage" in the code cache.
0x7f256d87f143 <call_init+163 at dl-init.c:80>: pop rbx
0x7f256d87f144 <call_init+164 at dl-init.c:80>: pop rbp
0x7f256d87f145 <call_init+165 at dl-init.c:80>: pop r12
0x7f256d87f147 <call_init+167 at dl-init.c:80>: pop r13
0x7f256d87f149 <call_init+169 at dl-init.c:80>: pop r14
0x7f256d87f14b <call_init+171 at dl-init.c:80>: ret
block:
0x41d3106d: lea rsp,[rsp-0x88]
0x41d31075: mov rbx,QWORD PTR [rsp+0x88]
0x41d3107d: mov rbp,QWORD PTR [rsp+0x90]
0x41d31085: mov r12,QWORD PTR [rsp+0x98]
0x41d3108d: mov r13,QWORD PTR [rsp+0xa0]
0x41d31095: mov r14,QWORD PTR [rsp+0xa8]
in_edge:
0x41d3109d: mov QWORD PTR [rsp],r15
0x41d310a1: mov r15,QWORD PTR [rsp+0xb0]
0x41d310a9: push rcx
0x41d310aa: push rdi
0x41d310ab: movabs rdi,0x40c49008 <-- IndirectEdge *
0x41d310b5: jmp QWORD PTR [rdi] <-- initially goes to next instruction.
go_to_granary:
0x41d310b7: mov rcx,r15
0x41d310ba: jmp 0x44531030
--- start of template ---
0x41d310bf: ud2
compare_target:
<not emitted: move negated target address into rcx>
0x41d310c1: lea rcx,[rcx+r15*1]
0x41d310c5: jrcxz 0x41d310cc
0x41d310c7: jmp 0x41d310b7
0x41d310cc: pop rdi
0x41d310cd: pop rcx
0x41d310ce: mov r15,QWORD PTR [rsp]
0x41d310d2: lea rsp,[rsp+0xb8]
<at instantiation: fall-through exit_to_block>
0x41d310da: ud2
--- end of template ---
Eventually, the "edge cache" shouldn't exist. There should simply be a hierarchy of increasingly "cold" caches.
I think when direct edges are eventually patched, we should also patch indirect patches to potentially change to hash tables or something else.
Another thing related to other code regions is that I think "slower" regions should be at a higher address than hotter regions. That way, jumps to colder code are always forward jumps (means that branch predictor follows fall-through first, I think.. double check the software optimization manual on this).
The MangleIndirectCall
function in 1_mangle.cc
and the corresponding need for a return address label should be eliminated with this.
Will be done in the next commit to the master
branch.
This would be nice, and could be the first step towards a being able to split code, at the granularity of fragments, across different regions of the code cache.