DynamoRIO / drmemory

Memory Debugger for Windows, Linux, Mac, and Android
Other
2.43k stars 262 forks source link

wrap_malloc test fails on linux #1740

Open byron-hawkins opened 9 years ago

byron-hawkins commented 9 years ago

The wrap_malloc test expects 1 unique unaddressable access with 20 instances, but on my goobuntu with glibc 2.19 it finds 51 additional unique unaddressable accesses with 1 instance each. All of the additional accesses are over-reads by 4 bytes in a region that does not appear to have any redzone (if it had been over-written, wouldn't there be a corresponding error reported?)

Error #1: UNADDRESSABLE ACCESS beyond heap bounds: writing 0xf75e78fc-0xf75e7900 4 byte(s)
#0 replace_memset              [/usr/local/google/home/byronh/work/dr1/drmemory/drmemory/replace.c:196] (0x739f1059 <libdrmemorylib.so.1.8.16637+0x1f1059>) modid:2
#1 ld-linux.so.2!__GI__dl_allocate_tls_init [/build/buildd/eglibc-2.19/elf/dl-tls.c:436] (0xf77e977b <ld-linux.so.2+0x1177b>) modid:4
#2 ld-linux.so.2!dl_main       [/build/buildd/eglibc-2.19/elf/rtld.c:2230] (0xf77db44b <ld-linux.so.2+0x344b>) modid:4
#3 ld-linux.so.2!_dl_start     [/build/buildd/eglibc-2.19/elf/rtld.c:332] (0xf77dca0b <ld-linux.so.2+0x4a0b>) modid:4
#4 ld-linux.so.2!_start        [/build/buildd/eglibc-2.19/elf/rtld.c:875] (0xf77d90d7 <ld-linux.so.2+0x10d7>) modid:4
Note: @0:00:01.544 in thread 132767
Note: next higher malloc: 0xf75e7e10-0xf75e7e98
Note: prev lower malloc:  0xf75e7010-0xf75e7218
Note: instruction: mov    %eax -> (%esi)
    error end

Error #2: UNADDRESSABLE ACCESS beyond heap bounds: reading 0xf75e7940-0xf75e7944 4 byte(s)
#0 libc.so.6!__GI___ctype_init [/build/buildd/eglibc-2.19/ctype/ctype-info.c:31] (0xf69c9e51 <libc.so.6+0x27e51>) modid:5
#1 ld-linux.so.2!call_init.part.0 [/build/buildd/eglibc-2.19/elf/dl-init.c:64] (0xf77e6d37 <ld-linux.so.2+0xed37>) modid:4
#2 ld-linux.so.2!_dl_init_internal [/build/buildd/eglibc-2.19/elf/dl-init.c:36] (0xf77e6e64 <ld-linux.so.2+0xee64>) modid:4
#3 ld-linux.so.2!_dl_start_user [/build/buildd/eglibc-2.19/elf/rtld.c:875] (0xf77d910f <ld-linux.so.2+0x110f>) modid:4
Note: @0:00:01.602 in thread 132767
Note: next higher malloc: 0xf75e7e10-0xf75e7e98
Note: prev lower malloc:  0xf75e7010-0xf75e7218
Note: instruction: mov    %gs:0x00 -> %eax
    error end    

Error #7: UNADDRESSABLE ACCESS beyond heap bounds: reading 0xf75e794c-0xf75e7950 4 byte(s)
#0 libc.so.6!__new_exitfn [/build/buildd/eglibc-2.19/stdlib/cxa_atexit.c:79] (0xf69d5279 <libc.so.6+0x33279>) modid:5
#1 libc.so.6!__cxa_atexit_internal [/build/buildd/eglibc-2.19/stdlib/cxa_atexit.c:35] (0xf69d542d <libc.so.6+0x3342d>) modid:5
#2 libc.so.6!__libc_start_main [/build/buildd/eglibc-2.19/csu/libc-start.c:220] (0xf69bb9e8 <libc.so.6+0x199e8>) modid:5
#3 malloc!?      (0x08048701 <malloc+0x701>) modid:1
Note: @0:00:01.670 in thread 132767
Note: next higher malloc: 0xf75e7e10-0xf75e7e98
Note: prev lower malloc:  0xf75e7010-0xf75e7218
Note: instruction: cmp    %gs:0x0c $0x00000000
    error end    
derekbruening commented 9 years ago

Xref the several ld-linux default suppressions: e.g., #79, #1236

Can you install ld-linux and libc debug symbols so we have understandable callstacks?

byron-hawkins commented 9 years ago

Logs from x64 wrap_malloc at -verbose 3 are posted here:

http://www.hawkinssoftware.net/drmemory/i1740.logs.tar.gz

derekbruening commented 9 years ago

So why isn't the TLS being marked defined? Don't we see the mmap w/ early injection, or we walk the maps regions and see it then for late?

zhaoqin commented 9 years ago

This is caused by Dr.Memory did not intercept __libc_memalign in ld-linux.so, and exposed by early injection with -no_replace_malloc.

Because of early injection, we saw all the instructions that are executed, including the tls setup code in ld-linux.so. The loader itself has a set of simple memory allocation routines including malloc, calloc, and __libc_memalign, where calloc calls to malloc, and malloc simply calls to __libc_memalign for memory allocation.

The error happens as follows: First init_tls allocates some memory via calloc, which ends up calls to mmap, so [0xf74f0000, 0xf74f1000) is added into heap region, and [0xf74f0010, 0xf74f0218) is allocated for the app

        # 0 mmap2 (0x00001000)
        # 1 fp=0xffc4c9f8 parent=0xffc4cb68 ld-linux.so.2!__libc_memalign+0x91
        # 2 fp=0xffc4ca28 parent=0x00000228 ld-linux.so.2!malloc
        # 3 fp=0xffc4ca48 parent=0xf7702000 ld-linux.so.2!calloc
        # 4 fp=0xffc4ca68 parent=0xf770255c ld-linux.so.2!init_tls
        # 5 fp=0xffc4ca98 parent=0xf770255c ld-linux.so.2!dl_main
        # 6 fp=0xffc4cbd8 parent=0xffc4cc48 ld-linux.so.2!_dl_start
        # 7 fp=0xffc4cc48 parent=0x00000000 ld-linux.so.2!_start    
...
 calloc-post 0xf74f0010-0xf74f0218 = 0x208 (really 0xf74f0000-0xf74f0228 0x228)

Then later in _dl_allocate_tls_storage, __libc_memalign is directly called to allocate memory:

  /* Allocate a correctly aligned chunk of memory.  */
  result = __libc_memalign (GL(dl_tls_static_align), size);

If we are using -replace_malloc, we will allocate the memory for calloc from our own heap, and treat memory allocated/mmapped from __libc_memalign as allocated, so no error reported. If we are using -no_replace_malloc, the memory mapped from __libc_memalign is considered as heap region because __libc_memalign is first called from malloc. But the later the direct to __libc_memalign is ignored, so we think part of the memory is not allocated.

Error #1: UNADDRESSABLE ACCESS beyond heap bounds: writing 0xf74f08fc-0xf74f0900 4 byte(s)
# 0 replace_memset
# 1 ld-linux.so.2!__GI__dl_allocate_tls_init
# 2 ld-linux.so.2!dl_main
# 3 ld-linux.so.2!_dl_start
# 4 ld-linux.so.2!_start 
zhaoqin commented 9 years ago

So the correct solution is to intercept __libc_memalign, xref issue #94. Since the problem is limited to TLS initialization in ld-linux, we could just suppress it first for now.

derekbruening commented 9 years ago

With aligned malloc support from #94, I put in replacement of __libc_memalign. I can't repro the extra unaddrs, but I can confirm that we now intercept this in ld.so:

intercepting __libc_memalign @0x49308060 type 7 in module ld-linux.so.2
new basic block @0x49308060 == ld-linux.so.2!__libc_memalign+0x0
whole-bb scratch: r1=%ecxspill#0 x0, r2=%eaxspill#1 x0
whole-bb scratch: r1=unused, r2=unused, efl=unused
set range 0xff82dd78-0xff82dd7c => 0x0
replace_memalign align=64 size=3008
        increased brk from 0x0804e000 to 0x0804f000
adjusting heap region from 0x0804c000-0x0804e000 to 0x0804c000-0x0804f000
adding heap region 0x0804c000-0x0804f000
        carving out new chunk @0x0804d4f8 => head=0x0804d500, res=0x0804d518
splitting off 0x0804d500-0x0804d5a0 from 0x0804d528
add_to_free_list: arena 0x0804c000 bucket 0 free front=0x0804d500 last=0x0804d500
set prev_size_shr of 0x0804d528 to 0x1
        replace_alloc_common arena=0x0804c000 flags=0x204 request=3008, align=64 alloc=3056 => 0x0804d540
@@@ unique callstack #15
# 0 replace_memalign                                  [/work/drmemory/git/src/common/alloc_replace.c:2755]
# 1 ld-linux.so.2!_dl_allocate_tls_storage
# 2 ld-linux.so.2!_dl_sysdep_start
set range 0x0804d540-0x0804e100 => 0x3
malloc 0x0804d540-0x0804e100
byron-hawkins commented 9 years ago

This still fails on my machine. Running with -verbose 2 shows that __libc_memalign is intercepted, but the extraneous errors are still reported:

sym lookup of ld-2.19!__libc_memalign in /lib/i386-linux-gnu/ld-2.19.so => 0 0x000155a0
symcache_symbol_add: ld-linux.so.2 "__libc_memalign" @ 0x155a0
symcache_symbol_add: ignoring dup entry __libc_memalign
intercepting __libc_memalign @0xf77345a0 type 7 in module ld-linux.so.2

Test error summary:

~~Dr.M~~ ERRORS FOUND:
~~Dr.M~~      56 unique,    93 total unaddressable access(es)
~~Dr.M~~       2 unique,     2 total uninitialized access(es)
~~Dr.M~~       1 unique,     1 total invalid heap argument(s)
~~Dr.M~~       0 unique,     0 total warning(s)
~~Dr.M~~       3 unique,     3 total,    155 byte(s) of leak(s)
~~Dr.M~~       1 unique,     1 total,     16 byte(s) of possible leak(s)
~~Dr.M~~ ERRORS IGNORED:
~~Dr.M~~       3 unique,     3 total,     51 byte(s) of still-reachable allocation(s)
derekbruening commented 9 years ago

This was erroneously closed: memalign was fully implemented only for -replace_malloc. Wrapping did not "just work" and so was disabled for memalign.

derekbruening commented 9 years ago

Un-assigning as I can't reproduce this. Having wrap intercept memalign & co. leads to a ton of false positives on my machine: it will take effort to figure out what's going on there.