DynamoRIO / dynamorio

Dynamic Instrumentation Tool Platform
Other
2.67k stars 563 forks source link

TestMemProtChg_FLAKY fails assertion is_readable_without_exception_try(pc, 1) #2554

Open fhahn opened 7 years ago

fhahn commented 7 years ago

This test fails consistently on the AArch64 nightly build machine. It hits the following assertion:

<../build_debug-internal-64/suite/tests/bin/security-common.TestMemProtChg_FLAKY (7354). Internal Error: DynamoRIO debug check failure: ../src/core/vmareas.c:8201 is_readable_without_exception_try(pc, 1)

http://dynamorio.org/CDash/testDetails.php?test=322307&build=26586 http://dynamorio.org/CDash/testDetails.php?test=321619&build=26542 http://dynamorio.org/CDash/testDetails.php?test=311514&build=26014

As @egrimley said in #2417, it seems like this test only fails on certain AArch64 implementations. On a system using a different AArch64 implementation to the nightly builder, the test consistently passes.

egrimley commented 7 years ago

For me on openSUSE Tumbleweed 20160916 with 4.7.3 kernel and 64K pages it passes without DEBUG and with DEBUG the consistent error is: Internal Error: DynamoRIO debug check failure: /../dynamorio/core/unix/memcache.c:456 dr_api_entry

egrimley commented 7 years ago

Here is the stack trace from the failure I see ("Internal Error: DynamoRIO debug check failure: /.../core/unix/memcache.c:456 dr_api_entry"):

#0  memcache_query_memory (pc=0x3fffffcf0b0 "\320\360\374\377\377\003", out_info=0x5255f8e8)
    at /.../dynamorio/core/unix/memcache.c:456
#1  0x000002aaaadcfdc8 in query_memory_ex (pc=0x3fffffcf0b0 "\320\360\374\377\377\003", 
    out_info=0x5255f8e8) at /.../dynamorio/core/unix/os.c:9204
#2  0x000002aaaadcfe74 in get_memory_info (pc=0x3fffffcf0b0 "\320\360\374\377\377\003", 
    base_pc=0x52580040, size=0x5255f938, prot=0x0)
    at /.../dynamorio/core/unix/os.c:9224
#3  0x000002aaaadcfc24 in get_stack_bounds (dcontext=0x52488660, base=0x5255f9a0, 
    top=0x5255f998) at /.../dynamorio/core/unix/os.c:9154
#4  0x000002aaaac6c838 in is_on_stack (dcontext=0x52488660, pc=0x430000 "\340\003", area=0x0)
    at /.../dynamorio/core/vmareas.c:3838
#5  0x000002aaaac72768 in check_thread_vm_area (dcontext=0x52488660, pc=0x430000 "\340\003", 
    tag=0x430000 "\340\003", vmlist=0x5255fe08, flags=0x5255fe00, stop=0x5255fe50, xfer=false)
    at /.../dynamorio/core/vmareas.c:7944
#6  0x000002aaaad8f440 in check_new_page_start (dcontext=0x52488660, bb=0x5255fdc8)
    at /.../dynamorio/core/arch/interp.c:711
#7  0x000002aaaad93c28 in build_bb_ilist (dcontext=0x52488660, bb=0x5255fdc8)
    at /.../dynamorio/core/arch/interp.c:3360
#8  0x000002aaaad9b220 in build_basic_block_fragment (dcontext=0x52488660, 
    start=0x430000 "\340\003", initial_flags=0, link=true, visible=true, for_trace=false, 
    unmangled_ilist=0x0) at /.../dynamorio/core/arch/interp.c:5187
#9  0x000002aaaab95b78 in dispatch (dcontext=0x52488660)
    at /.../dynamorio/core/dispatch.c:216

It seems to be taking the stack pointer and asking which memory region it is in, then going into a panic because the stack pointer is not in a memory region which it knows about. Seeing as the stack can be automatically extended by the kernel I don't see why this situation should be a such a surprise and so panic-worthy.

It also seems to be possible to get exactly the same failure on a different machine with 4K pages, where this test does not usually fail, by replacing PAGE_SIZE with 0x10000 at TestMemProtChg.c:124 (where buffer_stack is declared). I even get the same failure on X86_64 with this change. So perhaps I should remove the OpSys-AArch64 label from this issue.

Not sure how to fix this. I'm asking myself: how was this ever supposed to work? Derek, how does the vmareas cache usually cope with the stack growing?

derekbruening commented 7 years ago

Seeing as the stack can be automatically extended by the kernel...Derek, how does the vmareas cache usually cope with the stack growing?

Use of MAP_GROWSDOWN was removed from glibc a long time ago (2008?). A regular stack (from pthreads, e.g.) will never be auto-extended by the kernel.

That said, I'm not sure about other UNIXes, and there could be custom code b/c while people have tried to remove it from Linux, there is technically still Linux kernel support for MAP_GROWSDOWN (has inherent races so not supposed to use it AFAIK). The vmareas cache does not handle such things. There have been many problems with the cache and we have on more than one occasion wondered if its perf improvement was worth the trouble. Xref #2539.

egrimley commented 7 years ago

Test program:

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>

#define N 10000

void f(void);

int main()
{
    int fd = open("/proc/self/maps", O_RDONLY);
    char buf[N];
    ssize_t n = read(fd, buf, N);
    write(1, buf, n);
    f();
    return 0;
}

void f(void)
{
    char big[1000000];
    int fd = open("/proc/self/maps", O_RDONLY);
    char buf[N];
    ssize_t n = read(fd, buf, N);
    write(2, buf, n);
}

Result:

gcc -Wall -O0 st.c && ./a.out > out1 2> out2 && grep stack out[12]

out1:7ffc78244000-7ffc78265000 rw-p 00000000 00:00 0                          [stack]
out2:7ffc7816c000-7ffc78265000 rw-p 00000000 00:00 0                          [stack]

So the proces's original stack does get extended in this case.

egrimley commented 7 years ago

And here's an example of a "stack" created with mmap that grows automatically:

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>

#define N 10000

void dump_maps(int out)
{
    int fd = open("/proc/self/maps", O_RDONLY);
    char buf[N];
    ssize_t n = read(fd, buf, N);
    write(out, buf, n);
}

int main()
{
    char *x = mmap(0, 4096, PROT_READ | PROT_WRITE,
                   MAP_PRIVATE | MAP_ANONYMOUS | MAP_GROWSDOWN, -1, 0);
    int i;
    dump_maps(1);
    for (i = 4095; i > 0; i--)
        x[i] = 0;
    dump_maps(2);
    x[-1] = 0;
    dump_maps(3);
    return 0;
}

Result:

gcc -g -Wall -O0 st2.c && ./a.out > out1 2> out2 3> out3 && ( diff out[12] ; diff out[23] )

11c11
< 7efca5ce5000-7efca5ce5000 rw-p 00000000 00:00 0 
---
> 7efca5ce4000-7efca5ce5000 rw-p 00000000 00:00 0 
11c11
< 7efca5ce4000-7efca5ce5000 rw-p 00000000 00:00 0 
---
> 7efca5ce3000-7efca5ce5000 rw-p 00000000 00:00 0 

So I would guess the flag has gone from glibc but is still understood by the syscall.

I can believe that pthread stacks don't use it, but TestMemProtChg.c doesn't use pthreads.

egrimley commented 7 years ago

It's interesting how only the process's real original stack gets the label "[stack]" in /proc/self/maps. Adding MAP_STACK doesn't make any difference. The kernel source code for this seems to be in fs/proc/task_mmu.c:

        /*
         * We make no effort to guess what a given thread considers to be
         * its "stack".  It's not even well-defined for programs written
         * languages like Go.
         */
egrimley commented 6 years ago

As mentioned above, this problem can be reproduced on X86_64 so it's not AArch64-specific, even if it's people with AArch64 systems with 64K pages who experience it most often.