Knight-ZXW / Sliver

字节跳动sliver 采集Java函数栈实现
Apache License 2.0
94 stars 23 forks source link

在部分Android12 机型上会崩溃 #3

Open peachDaddy opened 1 year ago

peachDaddy commented 1 year ago

00 pc 0x00000000003aa704 /apex/com.android.art/lib64/libart.so (art::ThreadList::SuspendThreadByThreadId(unsigned int, art::SuspendReason, bool*)+476)

3KVagKS7J-eSYpicWVYcjA==/lib/arm64/libsliver.so (Java_com_knightboost_sliver_Sliver_nativeGetMethodStackTrace+204)

peachDaddy commented 1 year ago

看崩溃堆栈应该是线程挂起恢复的问题导致的。我是场景是在子线程中抓主线程的堆栈

Knight-ZXW commented 1 year ago

看崩溃堆栈应该是线程挂起恢复的问题导致的。我是场景是在子线程中抓主线程的堆栈

第一次调用就崩溃吗, 方便贴下完整的错误日志?

peachDaddy commented 1 year ago

完整的日志堆栈就只有这么多: backtrace:

00 pc 0x00000000003aa704 /apex/com.android.art/lib64/libart.so (art::ThreadList::SuspendThreadByThreadId(unsigned int, art::SuspendReason, bool*)+476)

01 pc 0x0000000000019478 /data/app/~~mgvOBbIeDffbIyJbjDr8hg==/com.yu.smart-1z2mp98Sd2Vs3_Ah6-VAXQ==/lib/arm64/libsliver.so (Java_com_knightboost_sliver_Sliver_nativeGetMethodStackTrace+204)

02 pc 0x00000000000c78a4 /data/app/~~mgvOBbIeDffbIyJbjDr8hg==/com.yu.smart-1z2mp98Sd2Vs3_Ah6-VAXQ==/oat/arm64/base.odex (art_jni_trampoline+132)

崩溃机型都是Android12 ,三星a51, a32x,a32。因为是线上的崩溃,我本地没有复现过

Knight-ZXW commented 1 year ago

SuspendThreadByThreadId

有错误的信号信息吗,没有信号看不出来具体是什么类型的错误

peachDaddy commented 1 year ago

就是没有信号相关的信息,我查了Android12 art runtime相关的代码,也看不出哪里的问题。Google Play上的堆栈不是很完整,只有这么多调用栈相关的信息。绝大部份崩溃机型都是Android12,极少部分12L和13,崩溃的地方都是这个线程挂起方法。

tx-lxy commented 4 months ago

@Knight-ZXW 这个问题我目前也碰到了, 在OPPO A93 Android12系统必崩,首次调用就崩了

2024-05-20 10:18:40.379 27173-27173/? A/DEBUG: signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x20010 2024-05-20 10:18:40.379 27173-27173/? A/DEBUG: x0 0000000000000000 x1 0000000000000001 x2 b400007d6708fd74 x3 0000000002010044 2024-05-20 10:18:40.379 27173-27173/? A/DEBUG: x4 000000000000002e x5 0000000000000000 x6 0000007ff8ef4b72 x7 7f7f7f7f7f7f7f7f 2024-05-20 10:18:40.379 27173-27173/? A/DEBUG: x8 0000000000020000 x9 b400007d67036ad0 x10 000000003b9aca00 x11 7f2a6227c9aca806 2024-05-20 10:18:40.379 27173-27173/? A/DEBUG: x12 0000000002010044 x13 0000036634a2b386 x14 0026050440513670 x15 0000000034155555 2024-05-20 10:18:40.379 27173-27173/? A/DEBUG: x16 0000000000000001 x17 0000036634a1a4d2 x18 0000007e061ea000 x19 b400007d67010800 2024-05-20 10:18:40.379 27173-27173/? A/DEBUG: x20 b400007d6708fd74 x21 0000000000000000 x22 0000007ce2f52870 x23 00000000000000e0 2024-05-20 10:18:40.379 27173-27173/? A/DEBUG: x24 b400007d67038ad0 x25 0000007ff8ef54d8 x26 0000000000000000 x27 b400007d6708fd60 2024-05-20 10:18:40.379 27173-27173/? A/DEBUG: x28 0000007d5f216000 x29 0000007ff8ef4ba0 2024-05-20 10:18:40.379 27173-27173/? A/DEBUG: lr 0000007d5ec31d08 sp 0000007ff8ef4ad0 pc 0000007d5ec31dac pst 0000000000001000 2024-05-20 10:18:40.379 27173-27173/? A/DEBUG: backtrace: 2024-05-20 10:18:40.379 27173-27173/? A/DEBUG: #00 pc 0000000000431dac /apex/com.android.art/lib64/libart.so (art::ThreadList::SuspendThreadByThreadId(unsigned int, art::SuspendReason, bool)+452) (BuildId: d307dc6adc4105b5e392ad710770385d) 2024-05-20 10:18:40.379 27173-27173/? A/DEBUG: #01 pc 0000000000023924 /data/app/~~V1hk2DDzeE6HGb-sxQnFwg==/com.knightboost.sliver.demo-0-f5za3StBteA4q1uQAVaA==/lib/arm64/libsliver.so (kbArt::ArtHelper::SuspendThreadByThreadId(unsigned int, kbArt::SuspendReason, bool)+52) (BuildId: 8a6d06bcc61178998736d5c1493ff7cb9199928b) 2024-05-20 10:18:40.379 27173-27173/? A/DEBUG: #02 pc 0000000000024878 /data/app/~~V1hk2DDzeE6HGb-sxQnFwg==/com.knightboost.sliver.demo-0-f5za3StBteA4q1uQAVaA==/lib/arm64/libsliver.so (Java_com_knightboost_sliver_Sliver_nativeGetMethodStackTrace+152) (BuildId: 8a6d06bcc61178998736d5c1493ff7cb9199928b) 2024-05-20 10:18:40.379 27173-27173/? A/DEBUG: #03 pc 0000000000440554 /apex/com.android.art/lib64/libart.so (art_quick_generic_jni_trampoline+148) (BuildId: d307dc6adc4105b5e392ad710770385d) 2024-05-20 10:18:40.379 27173-27173/? A/DEBUG: #04 pc 0000000000209a9c /apex/com.android.art/lib64/libart.so (nterp_helper+1948) (BuildId: d307dc6adc4105b5e392ad710770385d) 2024-05-20 10:18:40.379 27173-27173/? A/DEBUG: #05 pc 000000000072a870 /data/app/~~V1hk2DDzeE6HGb-sxQnFwg==/com.knightboost.sliver.demo-0-f5za3StBteA4q1uQAVaA==/oat/arm64/base.vdex (com.knightboost.sliver.Sliver.getSackTrace+8) 2024-05-20 10:18:40.379 27173-27173/? A/DEBUG: #06 pc 0000000000209334 /apex/com.android.art/lib64/libart.so (nterp_helper+52) (BuildId: d307dc6adc4105b5e392ad710770385d) 2024-05-20 10:18:40.379 27173-27173/? A/DEBUG: #07 pc 000000000079e4d6 /data/app/~~V1hk2DDzeE6HGb-sxQnFwg==/com.knightboost.sliver.demo-0-f5za3StBteA4q1uQAVaA==/oat/arm64/base.vdex (com.knightboost.sliver.demo.MainActivity.onCreate$lambda-5+18) 2024-05-20 10:18:40.380 27173-27173/? A/DEBUG: #08 pc 0000000000209334 /apex/com.android.art/lib64/libart.so (nterp_helper+52) (BuildId: d307dc6adc4105b5e392ad710770385d) 2024-05-20 10:18:40.380 27173-27173/? A/DEBUG: #09 pc 000000000079e2a8 /data/app/~~V1hk2DDzeE6HGb-sxQnFwg==/com.knightboost.sliver.demo-0-f5za3StBteA4q1uQAVaA==/oat/arm64/base.vdex (com.knightboost.sliver.demo.MainActivity.$r8$lambda$tfMEUHu-WIFrCjTr5LEiqOu9QDI+0) 2024-05-20 10:18:40.380 27173-27173/? A/DEBUG: #10 pc 0000000000209334 /apex/com.android.art/lib64/libart.so (nterp_helper+52) (BuildId: d307dc6adc4105b5e392ad710770385d) 2024-05-20 10:18:40.380 27173-27173/? A/DEBUG: #11 pc 000000000079e190 /data/app/~~V1hk2DDzeE6HGb-sxQnFwg==/com.knightboost.sliver.demo-0-f5za3StBteA4q1uQAVaA==/oat/arm64/base.vdex (com.knightboost.sliver.demo.MainActivity$$ExternalSyntheticLambda2.onClick+0) 2024-05-20 10:18:40.380 27173-27173/? A/DEBUG: #12 pc 000000000020b074 /apex/com.android.art/lib64/libart.so (nterp_helper+7540) (BuildId: d307dc6adc4105b5e392ad710770385d) 2024-05-20 10:18:40.380 27173-27173/? A/DEBUG: #13 pc 000000000038d3d2 /system/framework/framework.jar (android.view.View.performClick+34) 2024-05-20 10:18:40.380 27173-27173/? A/DEBUG: #14 pc 000000000020a254 /apex/com.android.art/lib64/libart.so (nterp_helper+3924) (BuildId: d307dc6adc4105b5e392ad710770385d) 2024-05-20 10:18:40.380 27173-27173/? A/DEBUG: #15 pc 00000000002a8536 /data/app/~~V1hk2DDzeE6HGb-sxQnFwg==/com.knightboost.sliver.demo-0-f5za3StBteA4q1uQAVaA==/oat/arm64/base.vdex (com.google.android.material.button.MaterialButton.performClick+6) 2024-05-20 10:18:40.380 27173-27173/? A/DEBUG: #16 pc 000000000020a254 /apex/com.android.art/lib64/libart.so (nterp_helper+3924) (BuildId: d307dc6adc4105b5e392ad710770385d) 2024-05-20 10:18:40.380 27173-27173/? A/DEBUG: #17 pc 000000000038d49a /system/framework/framework.jar (android.view.View.performClickInternal+6) 2024-05-20 10:18:40.380 27173-27173/? A/DEBUG: #18 pc 000000000020a254 /apex/com.android.art/lib64/libart.so (nterp_helper+3924) (BuildId: d307dc6adc4105b5e392ad710770385d) 2024-05-20 10:18:40.380 27173-27173/? A/DEBUG: #19 pc 0000000000388874 /system/framework/framework.jar (android.view.View.access$3700+0) 2024-05-20 10:18:40.380 27173-27173/? A/DEBUG: #20 pc 0000000000209334 /apex/com.android.art/lib64/libart.so (nterp_helper+52) (BuildId: d307dc6adc4105b5e392ad710770385d) 2024-05-20 10:18:40.380 27173-27173/? A/DEBUG: #21 pc 000000000036425c /system/framework/framework.jar (android.view.View$PerformClick.run+16) 2024-05-20 10:18:40.380 27173-27173/? A/DEBUG: #22 pc 000000000020b074 /apex/com.android.art/lib64/libart.so (nterp_helper+7540) (BuildId: d307dc6adc4105b5e392ad710770385d) 2024-05-20 10:18:40.380 27173-27173/? A/DEBUG: #23 pc 0000000000443074 /system/framework/framework.jar (android.os.Handler.handleCallback+4) 2024-05-20 10:18:40.380 27173-27173/? A/DEBUG: #24 pc 0000000000209334 /apex/com.android.art/lib64/libart.so (nterp_helper+52) (BuildId: d307dc6adc4105b5e392ad710770385d) 2024-05-20 10:18:40.380 27173-27173/? A/DEBUG: #25 pc 0000000000442ee8 /system/framework/framework.jar (android.os.Handler.dispatchMessage+8) 2024-05-20 10:18:40.380 27173-27173/? A/DEBUG: #26 pc 000000000020a254 /apex/com.android.art/lib64/libart.so (nterp_helper+3924) (BuildId: d307dc6adc4105b5e392ad710770385d) 2024-05-20 10:18:40.380 27173-27173/? A/DEBUG: #27 pc 00000000004715d2 /system/framework/framework.jar (android.os.Looper.loopOnce+438) 2024-05-20 10:18:40.380 27173-27173/? A/DEBUG: #28 pc 0000000000209334 /apex/com.android.art/lib64/libart.so (nterp_helper+52) (BuildId: d307dc6adc4105b5e392ad710770385d) 2024-05-20 10:18:40.380 27173-27173/? A/DEBUG: #29 pc 0000000000471cfe /system/framework/framework.jar (android.os.Looper.loop+178) 2024-05-20 10:18:40.380 27173-27173/? A/DEBUG: #30 pc 0000000000209334 /apex/com.android.art/lib64/libart.so (nterp_helper+52) (BuildId: d307dc6adc4105b5e392ad710770385d) 2024-05-20 10:18:40.380 27173-27173/? A/DEBUG: #31 pc 00000000001b71d8 /system/framework/framework.jar (android.app.ActivityThread.main+252) 2024-05-20 10:18:40.380 27173-27173/? A/DEBUG: #32 pc 0000000000436e00 /apex/com.android.art/lib64/libart.so (art_quick_invoke_static_stub+576) (BuildId: d307dc6adc4105b5e392ad710770385d) 2024-05-20 10:18:40.380 27173-27173/? A/DEBUG: #33 pc 0000000000469534 /apex/com.android.art/lib64/libart.so (_jobject art::InvokeMethod<(art::PointerSize)8>(art::ScopedObjectAccessAlreadyRunnable const&, _jobject, _jobject, _jobject, unsigned long)+1960) (BuildId: d307dc6adc4105b5e392ad710770385d) 2024-05-20 10:18:40.380 27173-27173/? A/DEBUG: #34 pc 0000000000468d64 /apex/com.android.art/lib64/libart.so (art::Method_invoke(_JNIEnv, _jobject, _jobject, _jobjectArray) (.__uniq.165753521025965369065708152063621506277)+48) (BuildId: d307dc6adc4105b5e392ad710770385d) 2024-05-20 10:18:40.380 27173-27173/? A/DEBUG: #35 pc 000000000036c148 /data/misc/apexdata/com.android.art/dalvik-cache/arm64/boot.oat (art_jni_trampoline+120)

Knight-ZXW commented 4 months ago

我晚点测试下, 另外这个项目是demo,我还没应用到生产环境,最好自己多测试下,已知的存在 一些偏移量兼容的问题还没改。

Knight-ZXW commented 4 months ago

SuspendThreadByThreadId 我这边测试 没有崩溃

image

tx-lxy commented 4 months ago

@Knight-ZXW 我是用”采集其他线程-锁等待信息测试“ , 你也是这个路径吗

tx-lxy commented 4 months ago

@Knight-ZXW 通过偏移量获取的指针有什么方式可以判断是否是threadList吗?加个校验会更安全点

Knight-ZXW commented 4 months ago

采集其他线程-锁等待信息测试

其他2个按钮的功能正常吗,我是测了3个都没崩溃

tx-lxy commented 4 months ago

其他2个按钮的功能正常吗,我是测了3个都没崩溃

”采集主线程调用栈“和”采集其他线程-锁等待信息测试“ 只有这两个路径会崩, 因为这两个路径才会调用SuspendThreadByThreadId

Knight-ZXW commented 4 months ago

其他2个按钮的功能正常吗,我是测了3个都没崩溃

”采集主线程调用栈“和”采集其他线程-锁等待信息测试“ 只有这两个路径会崩, 因为这两个路径才会调用SuspendThreadByThreadId

还测了哪些机型会崩溃吗,我是用的云测,在这个手机上测试 确实没崩溃

tx-lxy commented 4 months ago

@Knight-ZXW 像华为、荣耀 12系统的手机, 我测了几台都是正常的; 我反馈的oppo A93设备崩溃问题, 我目前初步定位是 获取的threadList实例不对, 应该是oppo 系统有魔改, 导致数据结构与google源码不一致, 目前我将数据结构改成PartialRuntimeTiramisu(Android13), 就可以正常运行;

目前我也不确定还有哪些设备会出现这样的问题, 想加个校验逻辑, 大佬有什么比较好的方式判断获取的指针是threadList对象?

Knight-ZXW commented 4 months ago

@Knight-ZXW 像华为、荣耀 12系统的手机, 我测了几台都是正常的; 我反馈的oppo A93设备崩溃问题, 我目前初步定位是 获取的threadList实例不对, 应该是oppo 系统有魔改, 导致数据结构与google源码不一致, 目前我将数据结构改成PartialRuntimeTiramisu(Android13), 就可以正常运行;

目前我也不确定还有哪些设备会出现这样的问题, 想加个校验逻辑, 大佬有什么比较好的方式判断获取的指针是threadList对象?

这个只能自己研究了,我还没研究过,提供个思路比如用 inline hook某个 函数参数包含 ThreadList 的api (比如 SuspendThreadByPeer), 然后通过某种方式去触发这个Api的调用 (比如正常的Java层 Thread.getStackTrace 最终会触发这个API的调用), 最后在代理的函数中,去获取threadList指针做比较。

image

image