crmulliner / ddi

ddi - Dynamic Dalvik Instrumentation Toolkit
http://www.mulliner.org/android/
394 stars 159 forks source link

hooking things in system_server crashes #8

Open jduck opened 9 years ago

jduck commented 9 years ago

According to @odexcide on Twitter, when hooking things in system_server, ddi always crashes even with a pass through hook.

The conversation started here but moved to Collin's private email. The following is the part that precedes the move to private email.

<@odexcide> @jduck @collinrm Have u been able to hook system_server successfully w/ DDI? Always crashes even with a pass through hook for me. <@collinrm> @odexcide @jduck what Android version? <@odexcide> @collinrm @jduck 4.2.2 Galaxy Nexus <@odexcide> @collinrm @jduck The crash doesn't have my lib in the back trace and looks like it is coming from JIT. The same hook in apps works fine. <@collinrm> @odexcide @jduck hooking code in system_server works, 4.2.2 should also not be an issue. Does it crash when executing the hook? or earlier? <@odexcide> @collinrm @jduck It will hook successfully but will crash later during exec. short after other times after a while. Same result in emulator. <@jduck> @odexcide @collinrm maybe stuff is getting moved and pointers hard coded? I'm not familiar with the internals of ddi <@collinrm> @odexcide @jduck does it ever execute or crash on the first try? <@odexcide> @collinrm @jduck Executes sometimes but always crashes...getting some log and more info for you

tr0pper commented 9 years ago

I had the same issue with simple hook on the system_server...on the Note I I 4.4.2... I'll redo the setup and post the logs and dumps.

odexcide commented 9 years ago

I can post the example I wrote but it is simple to just try using the strmon example (known to work on applications) on system_server. Collin said that he suspects it might be a concurrency issue.

The strmon hook will work several times but eventually crash system_server. Similarly, if you hook any method that is called frequently in system_server, a similar crash occurs. Fault address resolves the same each time, and the shared library is not in the backtrace.

odexcide commented 9 years ago

debuggerd snippet below from strmon:

I/DEBUG   (  126): F/libc    ( 1786): Fatal signal 11 (SIGSEGV) at 0x00000006 (code=1), thread 1841 (WifiStateMachin)
I/DEBUG   (  126): *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
Build fingerprint: 'google/takju/maguro:4.2.2/JDQ39/573038:user/release-keys'
I/DEBUG   (  126): Revision: '9'
I/DEBUG   (  126): pid: 1786, tid: 1841, name: WifiStateMachin  >>> system_server <<<
I/DEBUG   (  126): signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 00000006
I/DEBUG   (  126):     r0 40d050e0  r1 40d07558  r2 ffd6d3a7  r3 00000101
I/DEBUG   (  126):     r4 56fc6e14  r5 5a854368  r6 587a7000  r7 571cdd00
I/DEBUG   (  126):     r8 00000000  r9 4083904a  sl 4083a8da  fp 00000000
I/DEBUG   (  126):     ip 5f53ad9c  sp 5f53ad48  lr 407d27e8  pc 407ef4e0  cpsr 00000030
I/DEBUG   (  126):     d0  2868746977737466  d1  542d4b53502d326c
I/DEBUG   (  126):     d2  0053005300450061  d3  00730073000a0067
I/DEBUG   (  126):     d4  004d003d00640069  d5  0052004f0054004f
I/DEBUG   (  126):     d6  002d0041004c004f  d7  0043003100300044
I/DEBUG   (  126):     d8  0000000000000000  d9  0000000000000000
I/DEBUG   (  126):     d10 0000000000000000  d11 0000000000000000
I/DEBUG   (  126):     d12 0000000000000000  d13 0000000000000000
I/DEBUG   (  126):     d14 0000000000000000  d15 0000000000000000
I/DEBUG   (  126):     d16 7fffffffffffffff  d17 7fffffffffffffff
I/DEBUG   (  126):     d18 0000000000000000  d19 0000000000000000
I/DEBUG   (  126):     d20 00b6802e00b3802d  d21 00bc802f00b9802f
I/DEBUG   (  126):     d22 0707070703030303  d23 0000002f0000002e
I/DEBUG   (  126):     d24 0000000000000000  d25 0000000000000000
I/DEBUG   (  126):     d26 0000002f0000002f  d27 0000002f0000002f
I/DEBUG   (  126):     d28 0001000000010000  d29 0001000000010000
I/DEBUG   (  126):     d30 0003000000030000  d31 0003000000030000
I/DEBUG   (  126):     scr 80000090
I/DEBUG   (  126): 
I/DEBUG   (  126): backtrace:
I/DEBUG   (  126):     #00  pc 000444e0  /system/lib/libdvm.so (dvmFindCatchBlock+63)
I/DEBUG   (  126):     #01  pc 000277e4  /system/lib/libdvm.so
I/DEBUG   (  126): 
I/DEBUG   (  126): stack:
I/DEBUG   (  126):          5f53ad08  56fc6dec  
I/DEBUG   (  126):          5f53ad0c  58612adb  /system/framework/core.odex
I/DEBUG   (  126):          5f53ad10  5f53ad70  
I/DEBUG   (  126):          5f53ad14  4080ac35  /system/lib/libdvm.so (dvmCallMethodV(Thread*, Method const*, Object*, bool, JValue*, std::__va_list)+276)
I/DEBUG   (  126):          5f53ad18  5a854368  
I/DEBUG   (  126):          5f53ad1c  571d2358  /dev/ashmem/dalvik-LinearAlloc (deleted)
I/DEBUG   (  126):          5f53ad20  41b96e70  /dev/ashmem/dalvik-heap (deleted)
I/DEBUG   (  126):          5f53ad24  5f53ad70  
I/DEBUG   (  126):          5f53ad28  5a854368  
I/DEBUG   (  126):          5f53adcc  00000000  
I/DEBUG   (  126):          5f53add0  00000000  
I/DEBUG   (  126):          5f53add4  00000000
scintill commented 9 years ago

I think the issue is that the un-patching in dalvik_prepare() and re-patching in dalvik_postcall() are not atomic. With several threads and lots of calls, eventually one of them gets a "mixed" view of the method definition, and accesses something it isn't meant to. A similar scenario is noted here in the Dalvik source.

I believe I've got it isolated in this sample (jni directory for the hook library and small Java app combined.) It reliably crashes instantly upon injection, but if I change it to only have one thread, it doesn't crash. It might not be doing what I think, though, so feel free to critique.

If I'm right about the cause, I can think of fixes to try: a) forgo the un-patching and invoke the original method through Dalvik directly instead of JNI (assuming Dalvik itself can handle the method being not actually installed), b) find something that can be atomically patched (worst case might be a pointer to the entire class definition), c) reorder the patching/unpatching to try to minimize the severity of being interrupted, and recover from errors if they happen, d) patch Dalvik, or re-use existing features ("synchronized"?), to obtain a mutex on the method definition before reading or writing it.

scintill commented 9 years ago

Here is a rough draft of a fix that works for me. It's mostly tested against my sample above, but at the end I ran strmon in system_server without crashing, at least not within the few minutes I let it run, which is much better than it is without that patch.

I found that the JNI jmethodID is internally just a pointer to the Dalvik Method struct. So, I make a copy of the original Method and store it in mid so it can be invoked by the JNI hook function. I have to change the access flag to keep the class's vtable from being looked at, because that results in the hook function being called again, which recurses until the stack runs out. I think this won't work with static methods, but I couldn't test even the existing behavior on that (access fault when I try to hook a static method, even in a single-threaded app with the original ddi code -- maybe I'm doing something wrong though.)

I made dalvik_prepare() and dalvik_postcall() do nothing. They could be removed, and maybe postcall renamed to unhook. Maybe it's good to keep them, though, in case they are needed to properly hook and re-call certain types of methods.

Side note: I can't inject system_server if it's been running awhile; the injection reports it worked but the native side of the library doesn't appear to be called. I had to kill it and then inject the replacement process. I assume this is another issue, maybe specific to system_server.

odexcide commented 9 years ago

The patch by @scintill seems to have addressed my issues and I tested it with various hooks in system_server. Unless anyone else has issues with the patch, I think it should be pulled in.

scintill commented 9 years ago

To be pulled, I think it should at least free what it mallocs, be tested on static methods, and properly delete/move the code in dalvik_prepare() and _postcall() instead of returning. Also, I'm kind of nervous about the Method struct growing, and causing access violations or incorrect behavior when an incomplete copy is used by Dalvik, so it would be nice to find a clean way to know what the correct sizeof(Method) is.

Has anyone hooked a static method? If it's not possible right now, then at least it's not a regression if my method doesn't work for that either.