KwaiAppTeam / KOOM

KOOM is an OOM killer on mobile platform by Kwai.
Other
3.15k stars 420 forks source link

HeapAnalysisService 启动之后就没有回调,仿照 HeapAnalysisService 使用 shark 很快就分析出了结果,相关执行线程可能存在问题 #266

Open CatJason opened 7 months ago

CatJason commented 7 months ago

这是我的执行日志

2024-01-23 20:56:57.533 18639-18991 D ----OOM Monitor Memory---- 2024-01-23 20:56:57.534 18639-18991 D [java] max:536870912 used ratio:6% 2024-01-23 20:56:57.534 18639-18991 D [proc] VmSize:42290196kB VmRss:543180kB Threads:862 2024-01-23 20:56:57.534 18639-18991 D [meminfo] MemTotal:5797816kB MemFree:336308kB MemAvailable:2148836kB 2024-01-23 20:56:57.534 18639-18991 D avaliable ratio:37% CmaTotal:1024000kB ION_heap:0kB 2024-01-23 20:56:57.534 18639-18991 D [meet condition] overThresholdCount:3, threadCount: 862 2024-01-23 20:56:57.534 18639-18991 D over threshold dumpThreadIfNeed 2024-01-23 20:56:57.724 18639-18991 D threadNames = [m.x.x, Jit thread pool, Signal Catcher, ADB-JDWP Connec, HeapTaskDaemon, ReferenceQueueD, FinalizerDaemon, FinalizerWatchd, Binder:18639_1, Binder:18639_2, Binder:18639_3, AppEyeUiProbeTh, Profile Saver, Binder:18639_4, LogOperatorMana, LiteBase_Loggin, ThreadPoolServi, Lite_ThreadPool, https_event4x, liteav_low_prio, liteav_WatchDog, liteav_WatchDog, Lite_ThreadPool, HttpClient_6420, HttpClient_2577, OkHttp Connecti, WM.task-1, WM.task-2, queued-work-loo, Okio Watchdog, DefaultDispatch, DefaultDispatch, DefaultDispatch, flutter-worker-, flutter-worker-, 1.ui, 1.raster, 1.io, io.worker.1, io.worker.2, io.worker.3, io.worker.4, dart:io EventHa, mali-mem-purge, mali-utility-wo, mali-utility-wo, mali-utility-wo, mali-utility-wo, mali-utility-wo, mali-utility-wo, mali-utility-wo, mali-utility-wo, mali-cmar-backe, mali-hist-dump, work_thread, NetWorkSender, FileObserver, pool-4-thread-1, newsp0, newsp1, newsp2, pool-5-thread-1, CrashSDKNormalH, CrashSDKBkgdHan, m.max.x, m.max.x, pool-4-thread-2, efs-base, LaunchThreadPoo, ANR HANDLER THR, ZIDThreadPoolEx, single 1, ACCS0, push_client_thr, Thread-16, spdy-0, ent.File.Tracer, AWCN Scheduler1, ConnectivityThr, pool-7-thread-1, pool-9-thread-1, pool-6-thread-1, Thread-17, Thread-18, m.max.x, AMDC1, pool-8-thread-1, process reaper, AMDC2, dns-main, pool-10-thread-, glide-active-re, HandlerThread, RxSchedulerPurg, RxCachedWorkerP, RxCachedThreadS, RxCachedThreadS, RenderThread, Thread-21, RxCachedThreadS, magnifier pixel, TimeCheckThread, arch_disk_io_0, Thread-23, msg 1, Thread-24, RxCachedThreadS, OkHttp TaskRunn, Okio Watchdog, RxCachedThreadS, RxCachedThreadS, iaoheihe.cn/..., OkHttp TaskRunn, RxCachedThreadS, AsyncTask #2, pool-13-thread-, pool-13-thread-, LoopThread, pool-14-thread-, glide-disk-cach, glide-source-th, OkHttp Dispatch, glide-source-th, OkHttp Dispatch, RxComputationTh, glide-source-th, glide-source-th, Chrome_ProcessL, ThreadPoolServi, ThreadPoolForeg, ThreadPoolForeg, Chrome_IOThread, MemoryInfra, ThreadPoolForeg, ThreadPoolForeg, ThreadPoolForeg, AudioThread, ThreadPoolSingl, NetworkService, CookieMonsterCl, CookieMonsterBa, ThreadPoolForeg, VizWebView, CleanupReferenc, ccg_dispatch, ThreadPoolSingl, Chrome_InProcGp, Chrome_ChildIOT, AsyncTask #1 --, NetworkKitGRS, RequestManager, SL-NetWorkSende, FormalHASDK-bas, Binder:18639_5, JavaBridge, FormalHASDK-bas, arch_disk_io_1, arch_disk_io_2, arch_disk_io_3, Azx-1, Thread-36, Thread-37, Thread-38, Thread-39, Thread-40, Thread-41, Thread-42, Thread-43, Thread-44, Thread-45, Thread-46, Thread-47, Thread-48, Thread-49, Thread-50, Thread-51, Thread-52, Thread-53, Thread-54, Thread-55, Thread-56, Thread-57, Thread-58, Thread-59, Thread-60, Thread-61, Thread-62, Thread-63, Thread-64, Thread-65, Thread-66, Thread-67, Thread-68, Thread-69, Thread-70, Thread-71, Thread-72, Thread-73, Thread-74, Thread-75, Thread-76, Thread-77, Thread-78, Thread-79, Thread-80, Thread-81, Thread-82, Thread-83, Thread-84, Thread-85, Thread-86, Thread-87, Thread-88, Thread-89, Thread-90, Thread-91, Thread-92, Thread-93, Thread-94, Thread-95, Thread-96, Thread-97, Thread-98, Thread-99, Thread-100, Thread-101, Thread-102, Thread-103, Thread-104, Thread-105, Thread-106, Thread-107, Thread-108, Thread-109, Thread-110, Thread-111, Thread-112, Thread-113, Thread-114, Thread-115, Thread-116, Thread-117, Thread-118, Thread-119, Thread-120, Thread-121, Thread-122, Thread-123, Thread-124, Thread-125, Thread-126, Thread-127, Thread-128, Thread-129, Thread-130, Thread-131, Thread-132, Thread-133, Thread-134, Thread-135, Thread-136, Thread-137, Thread-138, Thread-139, Thread-140, Thread-141, Thread-142, Thread-143, Thread-144, Thread-145, Thread-146, Thread-147, Thread-148, Thread-149, Thread-150, Thread-151, Thread-152, Thread-153, Thread-154, Thread-155, Thread-156, Thread-157, Thread-158, Thread-159, Thread-160, Thread-161, Thread-162, Thread-163, Thread-164, Thread-165, Thread-166, Thread-167, Thread-168, Thread-169, Thread-170, Thread-171, Thread-172, Thre 2024-01-23 20:56:57.748 18639-18991 D OOMPreferenceManager.getFirstAnalysisTime():1706013855651 2024-01-23 20:56:57.748 18639-18991 D OOMPreferenceManager.getAnalysisTimes:2 2024-01-23 20:56:57.758 18639-20267 D mTrackReasons:[reason_thread_oom] 2024-01-23 20:56:57.758 18639-20267 D dumpAndAnalysis 2024-01-23 20:56:57.774 18639-20267 D hprof analysis dir:/storage/emulated/0/Android/data/com.x.x/files/oom/memory/hprof-aly 2024-01-23 20:56:57.776 18639-20267 D dump /storage/emulated/0/Android/data/com.x.x/files/oom/memory/hprof-aly/1.3.300_2024-01-23_20-56-57_759.hprof 2024-01-23 20:56:57.784 18639-20267 D before suspend and fork. 2024-01-23 20:57:01.849 18639-20267 D dump true, notify from pid 20268 2024-01-23 20:57:01.849 18639-20267 D end hprof dump 2024-01-23 20:57:02.850 18639-20267 D start hprof analysis 2024-01-23 20:57:02.864 18639-20267 D startAnalysisService 2024-01-23 20:57:02.930 18639-20267 D startAnalysisService get Pss:417915 2024-01-23 21:01:15.144 18639-18639 D background 2024-01-23 21:01:15.144 18639-18639 D stopLoop()

zefengsysu commented 6 months ago

和 #265 实际是同一个问题吧?KOOM 内部会通过 reanalysis 机制(镜像分析失败情况二次启动时会再进行一次镜像分析)来提升成功率。我们线上观察到确实有相当多的情况是通过 reanalysis 分析成功上报的,这里我考虑下怎么做优化,但整体策略上,还是需要类似 HeapAnalysisService 这样的非高优独立进程来进行分析(避免影响主进程)。