DynamoRIO / drmemory

Memory Debugger for Windows, Linux, Mac, and Android
Other
2.43k stars 262 forks source link

improve callstack walk perf further #711

Open derekbruening opened 9 years ago

derekbruening commented 9 years ago

From bruen...@google.com on December 07, 2011 22:14:44

this issue extends issue #460 but for malloc interception for leak detection where callstacks are gathered on every malloc, though there are far fewer low-hanging fruits here b/c this has been profiled and optimized in the past.

I did a bunch of performance improvements on callstack walking for Dr. Heapstat (xref PR 473640), resulting in today's optimized in-module checks, lowest-frame checks, DRi#228, DRi#226, and packed_callstack_hash().

\ TODO cfrac on Windows built /Ox /Oy-

now we avoid fp scans (xref issue #460 #s):

app mallocs: 10890330, frees: 10890127, large mallocs: 0 unique malloc stacks: 7050289 callstack fp scans: 0 callstack is_retaddr: 10890130, backdecode: 10890130, unreadable: 0

*\ INFO times for different modes

this is after issue #460 A through L

script: echo native for ((i=0; i<3; i++)); do /usr/bin/time ./cfrac.exe 41757646344123832613190542166099121 2>&1 | grep system done echo DR for ((i=0; i<3; i++)); do /usr/bin/time ~/dr/git/exports/bin32/drrun.exe -quiet ./cfrac.exe 41757646344123832613190542166099121 2>&1 | grep system done for j in "" "-no_count_leaks" "-no_check_uninitialized" "-no_check_uninitialized -no_count_leaks" "-leaks_only" "-leaks_only -no_zero_stack" "-leaks_only -no_count_leaks" "-leaks_only -no_count_leaks -no_track_allocs"; do echo $j for ((i=0; i<3; i++)); do /usr/bin/time ~/drmemory/git/build_drmem_rel/bin/drmemory.exe $j -quiet -dr c:/src/dr/git/exports -batch -- ./cfrac.exe 41757646344123832613190542166099121 2>&1 | grep system done done

native 0.00user 0.01system 0:01.74elapsed 0%CPU (0avgtext+0avgdata 234752maxresident)k 0.00user 0.01system 0:01.69elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k 0.00user 0.01system 0:01.72elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k DR 0.01user 0.00system 0:02.40elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k 0.00user 0.01system 0:02.39elapsed 0%CPU (0avgtext+0avgdata 234496maxresident)k 0.01user 0.00system 0:02.40elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k (drmemory defaults) 0.01user 0.00system 1:18.12elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k 0.00user 0.00system 1:15.48elapsed 0%CPU (0avgtext+0avgdata 234496maxresident)k 0.00user 0.00system 1:15.84elapsed 0%CPU (0avgtext+0avgdata 233984maxresident)k -no_count_leaks 0.00user 0.00system 0:57.40elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k 0.00user 0.00system 0:57.03elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k 0.00user 0.00system 0:57.42elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k -no_check_uninitialized 0.00user 0.00system 0:48.50elapsed 0%CPU (0avgtext+0avgdata 234496maxresident)k 0.00user 0.00system 0:45.48elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k 0.00user 0.00system 0:45.59elapsed 0%CPU (0avgtext+0avgdata 233984maxresident)k -no_check_uninitialized -no_count_leaks 0.00user 0.00system 0:27.33elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k 0.00user 0.00system 0:27.06elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k 0.00user 0.00system 0:27.78elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k -leaks_only 0.00user 0.00system 0:34.41elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k 0.00user 0.00system 0:34.38elapsed 0%CPU (0avgtext+0avgdata 233984maxresident)k 0.00user 0.00system 0:34.66elapsed 0%CPU (0avgtext+0avgdata 233984maxresident)k -leaks_only -no_zero_stack 0.00user 0.00system 0:33.57elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k 0.00user 0.01system 0:33.54elapsed 0%CPU (0avgtext+0avgdata 234496maxresident)k 0.00user 0.00system 0:33.58elapsed 0%CPU (0avgtext+0avgdata 234496maxresident)k -leaks_only -no_count_leaks 0.00user 0.00system 0:19.81elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k 0.00user 0.00system 0:17.76elapsed 0%CPU (0avgtext+0avgdata 234496maxresident)k 0.00user 0.00system 0:17.81elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k -leaks_only -no_count_leaks -no_track_allocs 0.00user 0.00system 0:03.22elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k 0.00user 0.00system 0:03.27elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k 0.00user 0.01system 0:03.33elapsed 0%CPU (0avgtext+0avgdata 234240maxresident)k

=> rough split: 1.7 1.7 app

prior to my issue #460 improvements, malloc interception was 30s instead of 15s

so this issue tries to shrink the 15s from callstack walking

one thing that shows up led to DRi#635: provide faster dr_try_setup() that doesn't allocate memory

Original issue: http://code.google.com/p/drmemory/issues/detail?id=711

derekbruening commented 9 years ago

From bruen...@google.com on December 08, 2011 08:30:16

xref issue #75

derekbruening commented 9 years ago

From bruen...@google.com on December 15, 2011 08:12:44

xref issue #703 : dynamically swap between scan-every-frame and shadow stack based on malloc freq

derekbruening commented 9 years ago

From bruen...@google.com on December 20, 2011 07:59:16

shadow stack is issue #724

derekbruening commented 9 years ago

From bruen...@google.com on January 10, 2012 19:23:14

for ui_tests the scan dominates (tends to happen, not surprisingly, on apps that use a lot of memory):

on laptop: % ./batch.sh native [----------] 1 test from NPAPITesterBase (2677 ms total) [----------] 1 test from NPAPITesterBase (992 ms total) [----------] 1 test from NPAPITesterBase (799 ms total) DR [----------] 1 test from NPAPITesterBase (8522 ms total) [----------] 1 test from NPAPITesterBase (8390 ms total) [----------] 1 test from NPAPITesterBase (8185 ms total) DR -code_api -disable_traces -bb_single_restore_prefix -max_bb_instrs 256 [----------] 1 test from NPAPITesterBase (5167 ms total) [----------] 1 test from NPAPITesterBase (5052 ms total) [----------] 1 test from NPAPITesterBase (5823 ms total) -no_check_uninitialized [----------] 1 test from NPAPITesterBase (222740 ms total) [----------] 1 test from NPAPITesterBase (48060 ms total) [----------] 1 test from NPAPITesterBase (45426 ms total) -no_check_uninitialized -no_leak_scan [----------] 1 test from NPAPITesterBase (36710 ms total) [----------] 1 test from NPAPITesterBase (37639 ms total) [----------] 1 test from NPAPITesterBase (34794 ms total) -no_check_uninitialized -no_count_leaks [----------] 1 test from NPAPITesterBase (33162 ms total) [----------] 1 test from NPAPITesterBase (33325 ms total) [----------] 1 test from NPAPITesterBase (33522 ms total)

xref issue #151 (improve leak scan perf) xref issue #568 (parallelize leak scan)