cloudius-systems / osv

OSv, a new operating system for the cloud.
osv.io
Other
4.11k stars 604 forks source link

ASLR, W^X, etc. #651

Open copumpkin opened 9 years ago

copumpkin commented 9 years ago

Intuitively, it seems like these measures still apply to something like OSv. Grepping the source tree revealed no nontrivial matches for 'ASLR', so I'm wondering if:

  1. I'm wrong and the protections don't make sense in this paradigm
  2. They do make sense, but there are significant difficulties to implementing them properly so it hasn't been done yet
  3. They do make sense, should be easy, but nobody's done it yet
  4. They do make sense, and are already implemented in OSv; I'm just grepping for the wrong thing!

Any pointers?

nyh commented 9 years ago

I agree with your observations, and I think the answer is 3 - they do make sense, should be easy (or at least, not exceedingly difficult), but nobody's done it yet.

ASLR (address-space layout randomization) does make sense even in OSv: One might argue that OSv's single-application philosophy means that if someone breaks into the application he cannot do any sort of "privilege escalation" and break into other applications, because there are none on this VM. But nevertheless, it is always better, if there is an exploit which can break into an application, that the result of this break will be an application crash (denial of service) rather than execution of the attacker's code. This is why things like ASLR do make sense even in OSv.

OSv is alreay very-much "prepared" for ASLR, most importantly all executables are position-independent so can be moved around, but more work is needed for a full ASLR implementation:

  1. We need to randomize the base address of loading PIEs and shared objects. Right now it's quite predictable.
  2. We need to randomize the locations of stacks, of results of mmap, and similar things. These are already "somewhat" random, because they depend on the order that threads run, etc., but probably not random enough.
  3. We also need to randomize the location of the kernel (which in OSv, also includes the C and C++ libraries). See also issue #190 which discusses moving the kernel to a location determined at run-time - if we can do that, randomizing the location is another small step.

See https://en.wikipedia.org/wiki/Address_space_layout_randomization#Linux for some more notes about how Linux did ASLR.

About W^X, I agree it's also useful to have. Our ELF object loading code already supports non-writable text pages, and even pages which are temporarily writable during the object's load and after doing relocations, made unwritable - this is the so-called "relro" feature, which OSv supports and even supports the "full relro" variant described in http://tk-blog.blogspot.co.il/2009/02/relro-not-so-well-known-memory.html. But I guess we could also offer a more strict "W^X" feature - where every page marked writable is automatically marked non-executable (the NX bit on x86_64), including (for example) stacks. I don't know if all applications can run this way, but it wouldn't hurt to allow it for applications which can run this way. By the way, OSv itself modifies its own code to enable tracepoints which might break W^X in the kernel, but since this is the only place it does this (beyond the usual relro feature that is only written on load time), I think we could start with simply not supporting tracepoints with W^X.

copumpkin commented 9 years ago

About W^X, I agree it's also useful to have. Our ELF object loading code already supports non-writable text pages, and even pages which are temporarily writable during the object's load and after doing relocations, made unwritable - this is the so-called "relro" feature, which OSv supports and even supports the "full relro" variant described in http://tk-blog.blogspot.co.il/2009/02/relro-not-so-well-known-memory.html. But I guess we could also offer a more strict "W^X" feature - where every page marked writable is automatically marked non-executable (the NX bit on x86_64), including (for example) stacks.

Is protecting the ELF loading process alone sufficient? I don't know the OSv code nearly well enough, but is the kernel itself loaded via the ELF loader? If so, that's probably fine, as long as newly allocated memory also enforces the separation. If the kernel is loaded in a special manner, I'd be concerned about the safely W^X-loaded program file somehow modifying (or convincing the kernel to modify) kernel memory for nefarious purposes during an exploit.

Re: the tracepoints, making the options mutually exclusive seems fine.

nyh commented 9 years ago

One small note about the state of non-executable stacks in OSv:

OSv allocates stack in two ways:

  1. Threads created by pthread_create - including all the application's threads - have their stacks allocated in pthread::allocate_stack(), using mmap (actually, our internal mmu::map_anon() is called directly), and the permission mmu::perm_rw - in other words, these stacks are not executable, so this is good news.
  2. OSv's internal threads (not created with the pthread API) default to having a small stack (65K by default - see sched::thread::init_stack()), and are allocated with malloc(). Our malloc() returns memory from the linear map, which (see linear_page_mapper) is configured with mmu::perm_rwx, i.e., is executable by default. We cannot "mprotect()" this stack because the linear map usually has 2 MB pages ("huge pages") for efficiencies, and the 65 KB stack is just part of a page.
n03l commented 9 years ago

Out of curiosity, how many are the internal threads?

nyh commented 8 years ago

On Fri, Nov 13, 2015 at 1:02 AM, n03l notifications@github.com wrote:

Out of curiosity, how many are the internal threads?

OSv uses threads liberally in the kernel, because they have very little overhead besides the memory use of the stack. In particular, sleeping threads do not slow down scheduling at all.

To list the running threads, you can use gdb's "osv info threads" or scripts/top.py.

As an example I ran "make image=rogue; scripts/run.py -c1" and got 116 threads - only one of them is the application thread. Many of these threads, perhaps too many (see issue #247) - belong to ZFS. But the vast majority are idle and their only overhead are is the memory their stacks take.

nyh commented 1 year ago

Interesting discussion on ASLR and W^X support (and non-support) on OSv and other unikernels: https://x41-dsec.de/news/missing-or-weak-mitigations-in-various-unikernels/. The table suggests that OSv already supports W^X but it is contradicted by the text which says that their "test scripts" verified that it isn't implemented on OSv.

wkozaczuk commented 1 year ago

I did come across this article some time ago as well. There is actually a bit of work to accomplish your 3 original parts. Especially the kernel randomization (KASLR) would not be that easy given some of that would require tweaking assembly for both x64 and aarch64, probably writing some code to update some bits of kernel ELF to update addresses, etc.

So looking at your original 3 steps, I think the 1st one is the easiest one: Randomize the base address of loading PIEs and shared objects.

The 2nd one is nice to have but given it somewhat random already maybe it is of lower priority: Randomize the locations of stacks, results of mmap, and similar things. (what are other things?)

The 3rd is the most complicated one: Randomize the location of the kernel.

It would be also nice to implement W^X for kernel ELF. Right now I believe it runs with all permissions on, right? You mention tracepoints that modify some portions of kernel text. But I also think there is also some memcpy related code (look at arch/x64/string.cc) that picks best implementation depending on cpuid which is similar. I wonder if we can change the kernel code memory protections to X during the boot after we enable all tracepoints but before we load the apps.

I wonder if we should create 4 new (or more) finer granularity issues and kill this one.