DynamoRIO / dynamorio

Dynamic Instrumentation Tool Platform
Other
2.65k stars 561 forks source link

Make -vm_size 2G by default for 64-bit #3570

Open derekbruening opened 5 years ago

derekbruening commented 5 years ago

This is part of a series of 64-bit scalability improvements: xref the original reachability issue #774, splitting vmcode from vmheap #1132, and W^X #3556 which wants this 2G by default as a pre-req.

The idea is that now we have put all of our reachability-guaranteed code into its own region, we may as well reserve the maximum 2G amount at init time for that region, and avoid having to ever run out of space and spill over onto individual OS allocations and run into complications like #2115.

The complication is that we need client libraries (including extension libraries, but not third-party dependencies loaded by our private loader that do not interact directly with DR) to be inside that same 2G region. So we need to coordinate between DR's VMM and the file mapping done by the private loader. In particular, the VMM wants to reserve 2G up front and then hand a piece of it to the loader to map a file. While this is feasible on UNIX where we can MAP_FIXED on top of an existing mmap, or munmap just a piece of a prior large mmap (though that route has a race), it is not possible on Windows to map a file on top of an existing address space reservation. Nor is it possible to un-reserve just a piece of a reservation: the entire thing must be un-reserved.

Here are some possible solutions for Windows:

I'm going with F as the short-term solution: i.e., only implement 2G vmcode size by default for Linux, using probably MAP_FIXED, and leave it as future work to bring in Windows, even if that future work changes how Linux works as well.

derekbruening commented 5 years ago

A complication is that -vm_size 2G means it's impossible to have a statically linked client be reachable from the code cache, unless we really want to put the app executable itself inside the region: which would cause numerous issues with checks for DR vs app addresses. In the docs we don't really guarantee that a static client is reachable so maybe we should just explicitly state that it is not reachable (and auto-disable -reachable_client for a static client, assuming we can detect it)?

derekbruening commented 5 years ago

A related complication is that it's impossible to satisfy -vm_base_near_app. This means we would have to mangle all of the app's rip-rel instructions. We never measured the perf difference of -vm_base_near_app so we don't have any historical data to say how bad this would be.

We could still place the region near the app and try to get some of the rip-rel reaching: but we prefer not to place after the app, which messes with the brk (at least for non-PIE), and before the app our first-used low addresses are not going to reach the app's .data.

derekbruening commented 5 years ago

I added some better statistics and did some measurements. On some apps such as SPEC2006, there are very few rip-rel instructions in the app (under 100): libc has more (~600). Those are static counts.

On some larger proprietary apps, there are more: 20K static, accounting for ~2.5% of dynamic memory references.

On a synthetic benchmark where I made fully 50% of dynamic memory references rip-rel (about half loads and half stores), I measure substantial overhead differences wrt mangling:

Plain DR: 24% slowdown for being far way and having to mangle all the rip-rels.

Memtrace with no i/o: still 10% slowdown! This is surprising since just the memtrace instrumentation is a 13x slowdown on SPEC with no i/o.

Memtrace with i/o: still 3% slowdown! This one is even more surprising and should be examined further.

More analysis on actual overhead on real apps with significant rip-rel percentages would be ideal, but just based on these preliminary results my conclusion is that -vm_base_near_app is important. My proposal is to back off the 2G default and make it 1G. With the 512M ASLR, that will allow for -vm_base_near_app for an app binary <512M.

I plan to keep the functionality of loading the client inside the VMM as well as the loss of reachability guarantees for static clients.

For W^X #3556 the plan is to give the user a choice: simply fail if the 1G limit is reached (unlikely, given that vmcode is now split from vmheap), or give up -vm_base_near_app and set 2G up front. This seems a reasonable compromise.

I did raise the vmheap size (to 2G) by default as there is little downside to doing so.

derekbruening commented 5 years ago

Summarizing why this issue is still open: primarily for Windows support for loading client libs inside the vmm, which is required for large vmcode sizes such as 2G.