KwaiAppTeam / KOOM

KOOM is an OOM killer on mobile platform by Kwai.
Other
3.15k stars 420 forks source link

native leak模块live_alloc_records_无限增大问题 #245

Open Mr-JingShi opened 1 year ago

Mr-JingShi commented 1 year ago

问题描述

struct Test
{
    Test() {
        ptr = new int[100];
    }
    ~Test() {
        delete[] ptr;
    }
    int* ptr;
    int a[100];
};

Test* test = new Test;
// delete test;

如上代码所示,new Test时会有两次new运算符被调用,RegisterAlloc会被调用两次,GetLeakAllocs时只检测到了test,并没有检测到test.ptr。因此test.ptr永远被留在了live_alloc_records_里,live_alloc_records_有无限增大的问题。

std::string也有此现象

https://github.com/KwaiAppTeam/KOOM/blob/a5430e2db995fb67435936bb2bddf1b42f690578/koom-demo/src/main/jni/native-leak-test.cpp#L69 在OnMonitor和RegisterAlloc新增部分打印

I/KOOM: OnMonitor address:0x7e9485e460 size:24
I/KOOM: RegisterAlloc address:0x7e9485e460 size:24
I/NativeLeakTest: TestNewLeak 0x7e9485e460 size 16

但只要稍微将string的长度变大点,例如:

auto str_ptr = new std::string("test_leak_string -- test_leak_string");

也会复现此现象,如下所示,可以清晰的看到两次内存申请

I/KOOM: OnMonitor address:0x7e86bc5980 size:24
I/KOOM: RegisterAlloc address:0x7e86bc5980 size:24
I/KOOM: OnMonitor address:0x7e86cab910 size:48
I/KOOM: RegisterAlloc address:0x7e86cab910 size:48
I/NativeLeakTest: TestNewLeak 0x7e86bc5980 size 36

stl里的其他容器也很令人担忧(目前仅针对std::string进行了测试)。

佐证

将get_unreachable_fn_的结果unreachable_memory进行打印

I/KOOM: unreachable_memory:  160 bytes in 4 unreachable allocations
      ABI: 'arm64'

      32 bytes unreachable at 7e86bc5980
       referencing 48 unreachable bytes in 1 allocation

rreferencing 48 unreachable bytes in 1 allocation就是描述的此现象。

解决思路

目前我能想到的是通过改造libmemunreachable源码解决此问题。 老师,看看是否有好的办法解决此问题?

Mr-JingShi commented 1 year ago

今天想到一个简单方法:将泄漏的地方进行UnregisterAlloc+free操作,可解决此问题。 https://github.com/KwaiAppTeam/KOOM/blob/a5430e2db995fb67435936bb2bddf1b42f690578/koom-native-leak/src/main/jni/src/leak_monitor.cpp#L209 修改为

freeMonitor(reinterpret_cast<void*>(CONFUSE(live->address)));

拿一次new std::string("test_leak_string -- test_leak_string");作为样本进行测试

I/KOOM: OnMonitor address:0x7e86c53020 size:24
I/KOOM: RegisterAlloc address:0x7e86c53020 size:24
I/KOOM: OnMonitor address:0x7e86f1ce20 size:48
I/KOOM: RegisterAlloc address:0x7e86f1ce20 size:48
I/NativeLeakTest: TestNewLeak 0x7e86c53020 size 36

一共两次内存申请 进行check leaks操作

I/libmemunreachable: unreachable memory detection done
E/libmemunreachable: 32 bytes in 1 allocation unreachable out of 11620848 bytes in 30458 allocations
I/KOOM: unreachable_memory:  32 bytes in 1 unreachable allocation
      ABI: 'arm64'

      32 bytes unreachable at 7e86c53020
I/KOOM: GetLeakAllocs size:2
I/KOOM: UnregisterAlloc address:0x7e86c53020
I/KOOM: GetLeakAllocs live_allocs leave size:1
I/NativeLeakMonitor: LeakRecordMap size: 1

live_allocs从2变成了1 再次进行进行check leaks操作

I/libmemunreachable: unreachable memory detection done
E/libmemunreachable: 48 bytes in 1 allocation unreachable out of 12240328 bytes in 35631 allocations
I/KOOM: unreachable_memory:  48 bytes in 1 unreachable allocation
      ABI: 'arm64'

      48 bytes unreachable at 7e86f1ce20
I/KOOM: GetLeakAllocs size:1
I/KOOM: UnregisterAlloc address:0x7e86f1ce20
I/KOOM: GetLeakAllocs live_allocs leave size:0
I/NativeLeakMonitor: LeakRecordMap size: 1

live_allocs从1变成了0 再次进行进行check leaks操作

I/libmemunreachable: unreachable memory detection done
E/libmemunreachable: 0 bytes in 0 allocations unreachable out of 12259432 bytes in 35829 allocations
I/KOOM: unreachable_memory:  0 bytes in 0 unreachable allocations
      ABI: 'arm64'
I/KOOM: GetLeakAllocs size:0
I/KOOM: GetLeakAllocs live_allocs leave size:0
I/NativeLeakMonitor: LeakRecordMap size: 0

将泄漏的地方进行UnregisterAlloc+free操作,下一次进行check leaks操作时检测到了reference unreachable。 然后将reference unreachable进行进行UnregisterAlloc+free操作,reference unreachable也被从live_alloc_records_中踢出了。

疑问:

// Just remove leak allocation(never be free)
// live->address has been confused, we need to revert it first
UnregisterAlloc(CONFUSE(live->address));

注释中Just remove leak allocation(never be free)是出于什么考虑呢?

zqhGeek commented 10 months ago

直接释放掉内存并不好吧,是否leak的判断也不一定准确,作为工具直接释放掉用户的内存,等用户使用的时候会野指针或者直接crash,我觉得宁愿释放掉自身的记录堆栈也比直接释放用户内存要好。

Mr-JingShi commented 9 months ago

直接释放掉内存并不好吧,是否leak的判断也不一定准确,作为工具直接释放掉用户的内存,等用户使用的时候会野指针或者直接crash,我觉得宁愿释放掉自身的记录堆栈也比直接释放用户内存要好。

(1)内存泄漏检测并能保证100%正确,线上检测出来的仍需要人工确认。 (2)检测工具不该擅自动用户空间的内存。 感谢🙏