Closed derekbruening closed 1 month ago
Wow, the code has been clobbered:
(gdb) x/4i $pc
=> 0x710f345f <is_region_memset_to_char>: int $0xcd
0x710f3461 <is_region_memset_to_char+2>: int $0xcd
0x710f3463 <is_region_memset_to_char+4>: int $0xcd
0x710f3465 <is_region_memset_to_char+6>: int $0xcd
$ cat /proc/3644517/maps
71000000-7104e000 r--p 00000000 fd:01 21376683 <path>/dr/git/build_x64_dbg_tests/lib64/debug/libdynamorio.so
7104e000-71338000 r-xp 0004e000 fd:01 21376683 <path>/dr/git/build_x64_dbg_tests/lib64/debug/libdynamorio.so
71338000-71401000 r--p 00338000 fd:01 21376683 <path>/dr/git/build_x64_dbg_tests/lib64/debug/libdynamorio.so
71401000-71462000 r--p 00400000 fd:01 21376683 <path>/dr/git/build_x64_dbg_tests/lib64/debug/libdynamorio.so
71462000-71480000 rw-p 00461000 fd:01 21376683 <path>/dr/git/build_x64_dbg_tests/lib64/debug/libdynamorio.so
71480000-714b9000 rw-p 00000000 00:00 0
555555554000-5555555af000 r--p 00000000 fd:01 21373125 <path>/dr/git/build_x64_dbg_tests/clients/bin64/drmemtrace_launcher
5555555af000-5555557d2000 r-xp 0005b000 fd:01 21373125 <path>/dr/git/build_x64_dbg_tests/clients/bin64/drmemtrace_launcher
First, os_heap_reserve_in_region()'s check for no constraints fails b/c POINTER_MAX is page-aligned, so it tries to fit inside the whole address space.
Next, it walks the gaps: but each time it gives a hint, the kernel instead allocates somewhere else and it deliberately fails:
(gdb) p preferred
$12 = (void *) 0x555515553000
(gdb) p size
$13 = 1073745920
(gdb) p/x preferred+size
$14 = 0x555555554000
(gdb) n
(gdb) p p
$15 = (void *) 0x7fffb7800000
os_heap_reserve 1073745920 bytes at 0x00007fb438a00000 not preferred 0x0000000030fff000
os_heap_reserve 1073745920 bytes at 0x00007fb438a00000 not preferred 0x000055be37e2f000
os_heap_reserve 1073745920 bytes at 0x00007fb438a00000 not preferred 0x00007fb438a1b000
os_heap_reserve 1073745920 bytes at 0x00007fb438a00000 not preferred 0x00007ffdbe9da000
os_heap_reserve_in_region: reserved 1073745920 bytes @ 0x0000000000000000 in 0x0000000000000000-0xfffffffffffff000
vmm_heap_unit_init unable to allocate at preferred=0x000000003f4d2000 letting OS place sz=1024M addr=0x0000000000000000
SYSLOG_WARNING: Full size vmm heap allocation failed
SYSLOG_WARNING: Full size vmm heap allocation failed
<Full size vmm heap allocation failed>
os_heap_reserve 1006637056 bytes at 0x00007f3015800000 not preferred 0x0000000034fff000
os_heap_reserve 1006637056 bytes at 0x00007f3015800000 not preferred 0x0000565298ae0000
os_heap_reserve 1006637056 bytes at 0x00007f3015800000 not preferred 0x00007f301581b000
os_heap_reserve 1006637056 bytes at 0x00007f3015800000 not preferred 0x00007ffe3328f000
os_heap_reserve_in_region: reserved 1006637056 bytes @ 0x0000000000000000 in 0x0000000000000000-0xfffffffffffff000
vmm_heap_unit_init unable to allocate at preferred=0x000000003f4d2000 letting OS place sz=960M addr=0x0000000000000000
os_heap_reserve 943722496 bytes at 0x00007f3019400000 not preferred 0x0000000038bff000
os_heap_reserve 943722496 bytes at 0x00007f3019400000 not preferred 0x000056529c6e0000
os_heap_reserve 943722496 bytes at 0x00007f3019400000 not preferred 0x00007f301941b000
os_heap_reserve 943722496 bytes at 0x00007f3019400000 not preferred 0x00007ffe36e8f000
os_heap_reserve_in_region: reserved 943722496 bytes @ 0x0000000000000000 in 0x0000000000000000-0xfffffffffffff000
<...>
vmm_heap_unit_init unable to allocate at preferred=0x000000003f4d2000 letting OS place sz=2M addr=0x0000000000000000
os_heap_reserve: 1052672 bytes @ 0x0000000070eff000
os_heap_reserve_in_region: reserved 1052672 bytes @ 0x0000000070eff000 in 0x0000000000000000-0xfffffffffffff000
vmm_heap_unit_init unable to allocate at preferred=0x000000003f4d2000 letting OS place sz=1M addr=0x0000000070eff000
vmm_heap_unit_init vmcode reservation: [0x0000000070eff000,0x00000000b0eff000) total=262144 free=262144
<...>
vmm_heap_unit_init vmheap reservation: [0x00007f2e51800000,0x00007f3051800000) total=2097152 free=2097152
So it keeps shrinking its attempted size until it's really small: from 1024M down to 1M! But it looks like this size reduction is not propagated to the caller, who thinks this is a 1024M allocation and has num_blocks set for that full size. So we hand out memory we don't own and fill with our debug patterns (like 0xcd) resulting in the crash.
Fixing the POINTER_MAX check to not try for a region: we get past here and have these reservations:
vmm_heap_unit_init vmcode reservation: [0x00007ff825c00000,0x00007ff865c00000) total=262144 free=262144
vmm_heap_unit_init vmheap reservation: [0x00007ff625a00000,0x00007ff825a00000) total=2097152 free=2097152
But we then fail with an assert which I think is related and coming from the kernel completely ignoring hints:
new dynamo vm area: 0x00007ff45e46d000-0x00007ff45e46d010 map_file
os_heap_commit: 4096 bytes @ 0x00007ff21dc60000
os_heap_commit: 4096 bytes @ 0x00007ff21dc61000
elf_loader_map_phdrs: module not loaded at preferred address
ASSERT FAILURE: <path>/clients/drcachesim/tracer/raw2trace.cpp:442: off_end || info.start - modvec_[info.containing_index].orig_seg_base + info.size <= modvec_[info.containing_index].total_map_size ()
Presumably that's the cause of the regression: the kernel now completely ignores preferred mmap base values, and apparently various code can't handle that, despite not passing MAP_FIXED.
So we have at least 3 bugs:
There is a new flag MAP_FIXED_NOREPLACE
that we would try here w/o risk of clobbering things.
This might be the culprit as we try to allocate directly adjacent to existing mappings: https://github.com/torvalds/linux/commit/5def1e0f476da713141269d86815aba7c2817cb5
This is a regression that just started happening on certain machines (maybe after an OS update?). Still figuring it out.
raw2trace crashes on heap alloc after a heap warning:
The crash: