angr / archr

Target-centric program analysis.
BSD 2-Clause "Simplified" License
72 stars 21 forks source link

DataScoutAnalzyer: Recent Linux Kernels do not Support Sendfile from /proc/<pid> Files #96

Closed 2over12 closed 3 years ago

2over12 commented 3 years ago
from  archr.targets import LocalTarget
from archr.analyzers import DataScoutAnalyzer

tgt = LocalTarget(["./test_bin"])

with tgt.build().start() as t:
    print(DataScoutAnalyzer(t).fire())

On recent linux kernels (5.11.0-27-generic) this code will print '([], [], b'', {})', failing to recover the env, argv, auxv, and map. This failure is due to sendfile not supporting an in_fd that is from /proc/. This change is also discussed here https://github.com/Gallopsled/pwntools/issues/1871.

A smaller bug also exists in the datascout analyzer on line 55. "mov rdi, 1; mov rsi, rax; mov rdx, 0; mov r10, 0x1000000; mov rax, 40; syscall;"Given the bug already mentioned in keystone-engine that defaults to a radix of 16, this syscall is actually to syscall 64 which is semget rather than the intended sendfile call.

My branch: https://github.com/2over12/archr/tree/fix_sendfile_bug fixes the amd64 shellcode by using the stack as a buffer to do a read/write loop of the file to stdout. I am not sure if this is the best approach to fixing this problem. The other solution I could think of was just inserting breakpoint shellcode then finding the pid of the target and pulling the file with say the retrieve_contents call to the target.

Kyle-Kyle commented 3 years ago

This is terrible news. Since the issue is in the kernel, it is likely architecture-independent, which means all shellcodes need to be changed accordingly. Considering the effort and cleanness, I prefer the other solution you proposed: use retrieve_contents to grab the information. I'll keep this open until we have a clean solution for it. And thank you for the information!

Kyle-Kyle commented 3 years ago

Hi, the needed change is made in here. It will be merged soon. The implementation is basically a copy-paste of yours, with a minor fix at line 59 ;)

Kyle-Kyle commented 3 years ago

I eventually didn't use retrieve_contents because getting pid is non-trivial in current archr.