Incorrect far jump on Linux x86_64.

kubo commented 5 years ago

The following code breaks the calling convention on Linux x86_64 when the callee is a variadic function. https://github.com/coolxv/cpp-stub/blob/540c48cdadc756637ccdd35b41ef1665c530da97/src/stub.h#L145-L150 It changes the RAX register. However the least 8 bits of the register are used for the number of floating point arguments passed to a variadic function. (See here)

coolxv commented 5 years ago

《System V Application Binary Interface AMD64 Architecture Processor Supplement Draft Version 0.99.4》3.2.3 Parameter Passing

How to fix it？

kubo commented 5 years ago

Use the r11 register. The code size is 13.
0x49, 0xbb, address(8 bytes), 0x41, 0x53, 0xc3
I guess that this works but I have not confirmed it.
Use no registers. The code size is 14.
0xff, 0x25, 0x00, 0x00, 0x00, 0x00, address(8 bytes)
Other tools use this.
Use no registers. The code size is 6 if the address is lower than 0x80000000. 0x68, address(lower 4 bytes), 0xc3
I guess that this works but I have not confirmed it.

I got the machine code as follows.

Write assembly code

    .text

// Use the rax register. The code size is 12.
push_rax_and_ret:
    movq $0x0102030405060708,%rax
    push %rax
    ret

// Use the r11 register. The code size is 13.
push_r11_and_ret:
    movq $0x0102030405060708,%r11
    push %r11
    ret

// Use no registers. The code size is 14.
jmp_absolute_address:
    jmp *1f(%rip) // jump to the address stored at the next of this instruction
1:
    .byte 0x08,0x07,0x06,0x05,0x04,0x03,0x02,0x01

// Use no registers. The code size is 6 if the address is lower than 0x80000000.
push_immediate_addr_and_ret:
    push $0x01020304
    ret

Compile it and print generated code on Linux x86_64. Note: .byte 0xf8,0xf7,0xf6,0xf5,0xf4,0xf3,0xf2,0xf1 was disassembled below but it should not because it isn't code but data.

$ gcc -c test.s && objdump -d test.o

test.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <push_rax_and_ret>:
   0:   48 b8 08 07 06 05 04    movabs $0x102030405060708,%rax
   7:   03 02 01 
   a:   50                      push   %rax
   b:   c3                      retq   

000000000000000c <push_r11_and_ret>:
   c:   49 bb 08 07 06 05 04    movabs $0x102030405060708,%r11
  13:   03 02 01 
  16:   41 53                   push   %r11
  18:   c3                      retq   

0000000000000019 <jmp_absolute_address>:
  19:   ff 25 00 00 00 00       jmpq   *0x0(%rip)        # 1f <jmp_absolute_address+0x6>
  1f:   08 07                   or     %al,(%rdi)
  21:   06                      (bad)  
  22:   05 04 03 02 01          add    $0x1020304,%eax

0000000000000027 <push_immediate_addr_and_ret>:
  27:   68 04 03 02 01          pushq  $0x1020304
  2c:   c3                      retq

Another idea is the combination of 32-bit relative jump (5 bytes) and far jump (14 bytes).
This overwrites only the first 5 bytes of hot-patched functions even when the destination address is far.

When jumping from 0x40000340 to 0x0102030405060708,

Pick up an unused address near 0x40000340 by using /proc/self/maps on linux or VirtualQuery() on Windows.
Allocate memory at the unused address by using mmap() on linux or VirtualAlloc() on Windows.
Write a 32-bit relative jump instruction from 0x40000340 to the allocated memory.
Write a far jump instruction from the allocated memory to 0x0102030405060708.

coolxv commented 5 years ago

fixed it // Use the r11 register. The code size is 13. push_r11_and_ret: movq $0x0102030405060708,%r11 push %r11 ret

coolxv / cpp-stub

Incorrect far jump on Linux x86_64. #7