DynamoRIO / drmemory

Memory Debugger for Windows, Linux, Mac, and Android
Other
2.44k stars 262 forks source link

full drmem MUCH slower than addronly in some cases: > 20x #681

Open derekbruening opened 9 years ago

derekbruening commented 9 years ago

From bruen...@google.com on November 10, 2011 11:29:03

bears investigation: is it encoding of all the instru?

native: [ OK ] Bzip2Test.Roundtrip (16 ms) 0.00user 0.00system 0:00.15elapsed 0%CPU (0avgtext+0avgdata 214784maxresident)k

plain DR: [ OK ] Bzip2Test.Roundtrip (31 ms) 0.00user 0.01system 0:01.48elapsed 1%CPU (0avgtext+0avgdata 214784maxresident)k

full drmem: [ OK ] Bzip2Test.Roundtrip (391 ms) [----------] 1 test from Bzip2Test (1047 ms total) 0.00user 0.01system 3:22.78elapsed 0%CPU (0avgtext+0avgdata 214016maxresident)k

I have some data from CodeAnalyst but no good conclusions yet.

also from issue #622 :

% ~/drmemory/git/build_drmemory/bin/drmemory.exe -no_count_leaks -batch -callstack_style 0xf -dr c:/src/dr/git/exports -- ./base_unittests.exe --gtest_filter="ToolsSanityTest.AccessesToNewMemory" [ RUN ] ToolsSanityTest.AccessesToNewMemory [ OK ] ToolsSanityTest.AccessesToNewMemory (101 ms) [----------] 1 test from ToolsSanityTest (510 ms total) [==========] 1 test from 1 test case ran. (1563 ms total) [ PASSED ] 1 test.

% ~/drmemory/git/build_drmemory/bin/drmemory.exe -no_check_uninitialized -no_count_leaks -batch -callstack_style 0xf -dr c:/src/dr/git/exports -- ./base_unittests.exe --gtest_filter="ToolsSanityTest.AccessesToNewMemory" [ RUN ] ToolsSanityTest.AccessesToNewMemory [ OK ] ToolsSanityTest.AccessesToNewMemory (11 ms) [==========] 1 test from 1 test case ran. (88 ms total) [ PASSED ] 1 test.

Original issue: http://code.google.com/p/drmemory/issues/detail?id=681

derekbruening commented 9 years ago

From timurrrr@google.com on November 10, 2011 23:32:50

re: Bzip2Test.Roundtrip -> you didn't put the addronly numbers here.

I remember Valgrind is also much slower than usual on complex calculation algorithms like bzip.

derekbruening commented 9 years ago

From bruen...@google.com on November 11, 2011 07:10:26

-no_check_uninitialized -no_count_leaks [ OK ] Bzip2Test.Roundtrip (141 ms) 0.00user 0.00system 0:10.90elapsed 0%CPU (0avgtext+0avgdata 214272maxresident)k

the weird part is that drmem does great on speccpu apps w/ code reuse like bzip2 in spec2000: that's what I optimized for

derekbruening commented 9 years ago

From timurrrr@google.com on November 11, 2011 07:19:54

Maybe that's because this test is small (can't neglect instrumentation time) whilst spec bzips a lot of data? What happens if you increase the test data, e.g 100M?