DynamoRIO / dynamorio

Dynamic Instrumentation Tool Platform
Other
2.66k stars 562 forks source link

support large pages #1680

Open derekbruening opened 9 years ago

derekbruening commented 9 years ago

Various places in our code assume 4KB pages on x86. We should parametrize the code to support large pages.

egrimley commented 8 years ago

This is required on AArch64, too. There's a system here with 64 KiB pages that DynamoRIO won't run on:

$ LD_SHOW_AUXV=1 /bin/true | grep PAGESZ
AT_PAGESZ:       65536
derekbruening commented 8 years ago

There is an additional complexity on modern architectures beyond simply replacing a static constant with a single value read at init time: multiple simultaneous page sizes.

How can DR and DR-based tools handle a mix of sizes? If most uses of page boundaries in tools are on pages they allocate themselves (guard pages and lazy-fault pages), then maybe it's not too hard, b/c those will match the at-boot page size. Only cases that act on app pages will have to dynamically query. These cases include:

It may not be possible to query the page size as non-root. The workaround may be to change the page prot and then re-scan the maps file to see how much changed (and ignore races).

egrimley commented 8 years ago

As I understand it, on Linux we have:

So I don't see much of a problem on Linux. But I don't know about other operating systems.

egrimley commented 8 years ago

I was able to run DynamoRIO on an AArch64 system with 64K pages with a dozen additional test failures compared to what I'm used to on AArch64. (Some tests might assume 4K pages.) I did it with these changes:

proc.h:
  # define PAGE_SIZE (64*1024)
heap.c:
  VMM_BLOCK_SIZE = 64*1024
optionsx.h:
  options->stack_size = MAX(options->stack_size, 128*1024);
  initial_heap_unit_size, 3*64*1024
  initial_global_heap_unit_size, 3*64*1024
  heap_commit_increment, 64*1024
  cache_commit_increment, 64*1024
  cache_bb_unit_init, 64*1024
  cache_trace_unit_init, 64*1024
  guard_pages, false

Most of that seems obvious in retrospect, and obviously I'm not claiming that the new values are in any way optimal. The guard_pages change is most interesting. According to the comment these are just for "heap units", but there may be different kinds of heap unit. Do we want guard pages when they have to be so big?

A proper patch would read the page size from the auxiliary vector or from /proc/self/auxv (on Linux). Where should that initialisation take place?

egrimley commented 8 years ago

If the page size is to be stored in a global variable, for me it's hard to see how to be sure that the variable will always get initialised before it is used, unless all access to the variable goes via a (possibly inline) function that checks whether it has been initialised and initialises it when necessary. Probably most references to the page size aren't performance-critical so perhaps one could do something like:

#define PAGE_SIZE dynamo_get_page_size()

size_t
dynamo_get_page_size(void)
{
    static size_t pagesz = 0;
    if (pagesz == 0)
        pagesz = ...
    return pagesz;
}

Arguably it's a bit misleading to make PAGE_SIZE, which looks like, and sometime is, a compile-time constant, into a function call. Users might not realise that it's something that one should hoist out of a loop, for example.

Another question is how to initialise pagesz. In a normal environment with a C library, the usual way on Linux is: sysconf(_SC_PAGESIZE). That should work if getenv is working, and getenv is used in some of DynamoRIO's initialisation code. However, if you want to be able to discover the page size from an arbitrary context that might not be robust enough. Reading /proc is one possibility, but I've been wondering whether it might be better (more robust and portable) to deduce the page size just by calling mmap and munmap. I think you can discover whether n is a multiple of the page size by seeing if you can do:

    char *p = mmap(NULL, n * 2, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
    bool result = munmap(p + n, n) == 0;
    munmap(p, n * 2);

The fiddling with the memory map might perhaps interact badly with something that another thread is doing, but this will nearly always be done (just once per process) before the app has started, and DynamoRIO already allocates memory for its own use, which has the same problem. So is this perhaps worth considering?

derekbruening commented 8 years ago

Transparent huge pages: the kernel uses a mixture of pages sizes but make this completely invisible to > user space.

Yes, it does look like user mode can change protection of a 4K region of a 2M transparent huge page and the kernel will auto-magically split the huge page back up into small pages with the only negative consequences performance, which sort of addresses the app page prot concern.

If the app was launched under DR, AT_PAGESZ seems the simplest and fastest way to go. Deducing the size from the reading of the maps file that we already do (e.g., what is the size of the vsyscall page, or vdso which is few enough pages to tell) seems like the best way for other scenarios.

egrimley commented 8 years ago

With https://codereview.appspot.com/313800043/ it is possible to run DynamoRIO on an AArch64 system with 64K pages if you provide appropriate command-line options, such as:

bin64/drrun -vmm_block_size 64 -stack_size 128 -initial_heap_unit_size 192 -initial_global_heap_unit_size 192 -heap_commit_increment 64 -cache_commit_increment 65536 -no_guard_pages -- ls

I think the final step is to add code to round the default parameter values to multiples of the page size, where necessary. But it probably makes sense to first change the meaning of at least one parameter that currently includes the guard pages so that it doesn't include them as otherwise it's hard to have a sensible default value.

egrimley commented 8 years ago

With b94a6d0 all the tests pass (with the usual flakiness) on AArch64 with 4K and with 64K pages.

Things to do:

derekbruening commented 8 years ago

Obtain page size on Mac using system call. (Again, no immediate practical benefit, as far as I know.)

This should be done as it is a very simple and direct one-line system call.

egrimley-arm commented 1 year ago

This may have regressed a bit since I last looked. Today I tested 352e4967ca4f7054150eae993529c523c55765f8 on a machine with 64K pages (getconf PAGESIZE) and got 64 failures from the test suite!

I don't know how many of those failures relate to 64K pages but the first one seems to: in unit_tests, EXPECT failed at [...]/options.c:2789, presumably because with adjust_defaults_for_page_size you get a whole load of other things preceding the expected -vmheap_size 16G.