Closed CcWeapon closed 2 years ago
Hello @CcWeapon ! This is good question, I think it is there for historical reasons. I was thinking about removing this, but the case is quite rare and I didn't found it in my code base yet. To remove this I think we will need to write a few tests just to be sure everything works fine. Right now I don't see reasons why we won't handle such cases properly.. Maybe @maksfb or @rafaelauler might tell something more about it..
If you reaching this line, it means your input binary has a weird function with a call to an address that is inside the same function of the call. It is not a recursive call because the call target is not the function address itself, but rather a block inside it. It is doing something like this:
func: ... instructions basicblock-a: .... instructions basic-block-b: call basicblock-a
We expect control transfer inside the same function to happen through jump/branch instructions. When we detect a call to a block inside the same function, we conservatively assume we do not understand the CFG for this function and mark it as non simple, which means we will not optimize this function or change anything there.
In X86, there are some reasons you might have a call like that. For example, to get current PC address via return address:
func: call label label: pop %rax ; RAX now has label address
Example of code exercising this case:
// This is reduced test case for BOLT containing an internal call based on // GetCoreDump (from google core dumper).
.text .globl getCallback .type getCallback, %function getCallback: .cfi_startproc pushq %rbp movq %rsp, %rbp pushq %r12 pushq %rbx subq $288, %rsp callq .Lnext_instr .Lnext_instr: popq %rax addq $17, %rax addq $288, %rsp popq %rbx popq %r12 popq %rbp retq .Lweird_callback: mov $0xDEADBEEF, %rax retq .cfi_endproc
For this code, we probably need to mark the function as ignored, and not as non-simple. Ignore means we will not move it.
I agree that the case is weird, but just wondering, but are there any real blocking things that prevents us from optimizing such things? Yes, there would be fall-through BB and we will create extra entry point, but actually it seems that everything should work fine. Or it us just an assumption that something weird asm-written might going on here and it is better not to touch such things?
You can find art_quick_osr_stub() in AOSP, it calls .Losr_entry in itself. If I don't set this function to simple, what error might happen?
ENTRY art_quick_osr_stub
SPILL_ALL_CALLEE_SAVE_GPRS @ Spill regs (9)
vpush {s16-s31} @ Spill fp-regs (16)
.cfi_adjust_cfa_offset 64
SAVE_SIZE=(9*4+16*4)
mov r11, sp @ Save the stack pointer
.cfi_def_cfa r11, SAVE_SIZE @ CFA = r11 + SAVE_SIZE
.cfi_remember_state
mov r10, r1 @ Save size of stack
ldr r9, [r11, #(SAVE_SIZE+4)] @ Move managed thread pointer into r9
REFRESH_MARKING_REGISTER
mov r6, r2 @ Save the pc to call
sub r7, sp, #12 @ Reserve space for stack pointer,
@ JValue* result, and ArtMethod* slot.
and r7, #0xFFFFFFF0 @ Align stack pointer
mov sp, r7 @ Update stack pointer
str r11, [sp, #4] @ Save old stack pointer
str r3, [sp, #8] @ Save JValue* result
mov ip, #0
str ip, [sp] @ Store null for ArtMethod* at bottom of frame
// r11 isn't properly spilled in the osr method, so we need use DWARF expression.
// NB: the CFI must be before the call since this is the address gdb will lookup.
// NB: gdb expects that cfa_expression returns the CFA value (not address to it).
.cfi_escape /* CFA = [sp + 4] + SAVE_SIZE */ \
0x0f, 6, /* DW_CFA_def_cfa_expression(len) */ \
0x92, 13, 4, /* DW_OP_bregx(reg,offset) */ \
0x06, /* DW_OP_deref */ \
0x23, SAVE_SIZE /* DW_OP_plus_uconst(val) */
bl .Losr_entry @ Call the method
ldr r10, [sp, #8] @ Restore JValue* result
ldr sp, [sp, #4] @ Restore saved stack pointer
.cfi_def_cfa sp, SAVE_SIZE @ CFA = sp + SAVE_SIZE
ldr r4, [sp, #SAVE_SIZE] @ load shorty
ldrb r4, [r4, #0] @ load return type
cmp r4, #68 @ Test if result type char == 'D'.
beq .Losr_fp_result
cmp r4, #70 @ Test if result type char == 'F'.
beq .Losr_fp_result
strd r0, [r10] @ Store r0/r1 into result pointer
b .Losr_exit
.Losr_fp_result:
vstr d0, [r10] @ Store s0-s1/d0 into result pointer
.Losr_exit:
vpop {s16-s31}
.cfi_adjust_cfa_offset -64
pop {r4, r5, r6, r7, r8, r9, r10, r11, pc}
.Losr_entry:
.cfi_restore_state
.cfi_def_cfa r11, SAVE_SIZE @ CFA = r11 + SAVE_SIZE
sub sp, sp, r10 @ Reserve space for callee stack
sub r10, r10, #4
str lr, [sp, r10] @ Store link register per the compiler ABI
mov r2, r10
mov r1, r0
mov r0, sp
bl memcpy @ memcpy (dest r0, src r1, bytes r2)
bx r6
END art_quick_osr_stub
My worry with the function I pasted is that we have an unidentified code reference that we will not update / will break if we move these functions:
callq .Lnext_instr
.Lnext_instr:
popq %rax
addq $17, %rax ; <-- 17 bytes of distance after .Lnext_instr, pointing to .Lweird_callback
addq $288, %rsp
popq %rbx
popq %r12
popq %rbp
retq
.Lweird_callback: ; <-- we can't move this block
mov $0xDEADBEEF, %rax
retq
.cfi_endproc
In the example above, the immediate $17 will not be updated by BOLT, and moving this code will break it. That's why we conservatively put this function into a block list.
thx~
in BinaryFunction.cpp::BinaryFunction::disassemble()
my device is aarch64, and bolt is set to false for the IsSimple of this func. What's the reason for bolt doing this?