Solo5 / solo5

A sandboxed execution environment for unikernels
ISC License
911 stars 142 forks source link

hvt: Enforce that guest executable code is not also writable (W^X) #303

Open mato opened 5 years ago

mato commented 5 years ago

As part of general hardening, we should enforce both in the ELF loader and at the guest-physical to host-virtual translation layer that any executable code in the guest is not also writable (W^X).

Summary of status for hvt back-ends:

There are several sub-issues to this:

  1. hvt_elf_load() currently warns if a PT_LOAD PHDR requests both PF_X and PF_W. This should be changed to a fatal error instead.
  2. hvt_elf_load() should enforce that PT_LOAD PHDRs cannot overlap, otherwise an attacker could construct a malicious ELF defeating the logic in the previous point. While overlapping PT_LOAD PHDRs are not explicitly mentioned in the ELF specification, there is no reason to support this behaviour. This check can be further simplified by enforcing that PT_LOAD PHDRs must be sorted by p_vaddr in the ELF file, which is within the specification.
  3. hvt_elf_load() currently applies PF_x flags from a PT_LOAD PHDR by calling mprotect() on the (host-side) guest memory region. This results in the correct behaviour (enforcement) only on Linux/KVM which updates the guest-physical to host-virtual translation (i.e. EPT on x86_64) as expected. A solution will need to be found to get the same behaviour on FreeBSD/vmm and OpenBSD/vmm.
  4. A test case needs to be added, verifying that .text as seen by the guest is not writable.
ricarkol commented 5 years ago

The warning for "if a PT_LOAD PHDR requests both PF_X and PF_W." was because of IncludeOS which creates a single XWR segment (at least when using the solo5 platform). Don't know if that is still happening. In any case, I think the IncludeOS guys are interested in W^X code, so this is worth fixing on that side.

@alfred-bratterud ^

adamsteen commented 5 years ago

@mato OpenBSD will not allow an mprotect call with both write and execute enabled, W^X has been enforced at OS level since September 2016. I initially hit this in porting efforts.

See OpenBSD Innovations W^X

mato commented 5 years ago

@mato OpenBSD will not allow an mprotect call with both write and execute enabled, W^X has been enforced at OS level since September 2016. I initially hit this in porting efforts.

@adamsteen I know that. What I don't know is, does this subsequent mprotect() for a PHDR with PF_X | PF_R set (i.e. .text)

https://github.com/mato/solo5/blob/enforce-nox/tenders/hvt/hvt_elf.c#L215

called on the guest memory range initially set up at

https://github.com/mato/solo5/blob/enforce-nox/tenders/hvt/hvt_openbsd.c#L117

have the intended effect on the underlying EPT mapping actually used by vmm to back that memory, i.e. disallowing PROT_WRITE. I'm guessing it does as otherwise the OpenBSD port would probably not work at all since the initial mapping does not include PROT_EXEC, but given that the FreeBSD vmm has a separate interface for this (VM_MMAP_MEMSEG) I'm not 100% sure.

To confirm, could you build the branch at https://github.com/mato/solo5/tree/enforce-nox, manually run the test_nox case, and post the output? While you're at it, can you confirm that all the other new tests there pass?

adamsteen commented 5 years ago

@mato output from test_nox run manually.

doas ./solo5-hvt test_nox.hvt
            |      ___|
  __|  _ \  |  _ \ __ \
\__ \ (   | | (   |  ) |
____/\___/ _|\___/____/
Solo5: Memory map: 512 MB addressable:
Solo5:   reserved @ (0x0 - 0xfffff)
Solo5:       text @ (0x100000 - 0x104fff)
Solo5:     rodata @ (0x105000 - 0x105fff)
Solo5:       data @ (0x106000 - 0x10afff)
Solo5:       heap >= 0x10b000 < stack < 0x20000000

**** Solo5 standalone test_nox ****

Solo5: solo5_exit(1) called

All the other tests succeed when run with doas tests/run-tests.sh

mato commented 5 years ago

@adamsteen:

Thanks. So it looks like vmm is not updating EPT to match the prot from mprotect(). You know the OpenBSD vmm folks, could you please ping them and and ask how to get this behaviour (or equivalent)? Pointing them to https://github.com/Solo5/solo5/issues/303#issuecomment-446503933 or this issue in general should be enough for them to understand what we need.

I've been experimenting on FreeBSD vmm and have yet to figure out how to get the intended behaviour with VM_MMAP_MEMSEG. It looks like there is no straightforward solution, will have to ask some of the FreeBSD vmm folks as well (cc @hannesm, who/where? freebsd-virtualization@? or, if you have a direct contact please let me know privately).

All the other tests succeed when run with doas tests/run-tests.sh

That's odd, I would have expected test_ssp to fail, since #294 does not enable it on OpenBSD toolchains. What does that say when run manually?

mato commented 5 years ago

Edit: Updated description with summary of status across back-ends and more information.

adamsteen commented 5 years ago

@mato yeah test_ssp fails with an abort.

            |      ___|
  __|  _ \  |  _ \ __ \
\__ \ (   | | (   |  ) |
____/\___/ _|\___/____/
Solo5: Memory map: 512 MB addressable:
Solo5:   reserved @ (0x0 - 0xfffff)
Solo5:       text @ (0x100000 - 0x104fff)
Solo5:     rodata @ (0x105000 - 0x105fff)
Solo5:       data @ (0x106000 - 0x10afff)
Solo5:       heap >= 0x10b000 < stack < 0x20000000

**** Solo5 standalone test_ssp ****

1234567890123456789012345678901234567890
Solo5: trap: type=#UD ec=0x0 rip=0x10000a rsp=0x1fffffe0 rflags=0x10006
Solo5: ABORT: cpu_x86_64.c:171: Fatal trap
mlarkin2015 commented 5 years ago

Regarding the OpenBSD vmm(4) part, I'll comment that there are actually 2 separate views of the VM's memory. One is maintained by the userland part (vmd(8) in base OpenBSD or whatever you're using in Solo5 in userland to setup and launch the VM). The userland view is what receives mmap/mprotect updates. The maps involved are shared internally by a UVM API called uvm_share which handles mapping the same pages into the EPT view as well as the userland part.

Digging around in the depths of UVM isn't for the faint of heart; what I proposed on the OpenBSD misc@ mailing list to @adamsteen was to implement a new API that does an mprotect on the EPT side via an ioctl to vmm(4). Would that work? We could make the API flexible (and likely make it just look like mprotect to some degree). It would be two calls though, one "regular" mprotect and one "ept-mprotect", so there would be a short time when the maps might be out of sync.

Thoughts?

adamsteen commented 5 years ago

@mato a note on OpenBSD and test_ssp, OpenBSD switched to using lld (LLVM/Clang 6.0.0) as of 2018-10-22 so its in the same boat as FreeBSD with regard to -mstack-protector-guard=global

mato commented 5 years ago

@mlarkin2015:

whatever you're using in Solo5 in userland to setup and launch the VM

Just FYI, we don't use vmd(8), the userland part is the "solo5-hvt" tender which @adamsteen ported to use the OpenBSD vmm APIs directly. However, we (well, I, when reviewing) just assumed that the userland mprotect() was sufficient to update the EPT mappings.

what I proposed on the OpenBSD misc@ mailing list to @adamsteen was to implement a new API that does an mprotect on the EPT side via an ioctl to vmm(4). Would that work? We could make the API flexible (and likely make it just look like mprotect to some degree).

That would work for us. Having it as a separate ioctl means we can test for its presence at run time, and print a warning but still run on older systems without this feature until some time after an OpenBSD with it has been released. I'll also try and coordinate something similar/figure out the situation for our FreeBSD/vmm backend.

Regarding flexibility, a useful feature to expose in an "ept-mprotect" would be the x86 ability to have execute-only EPT mappings, if possible. This would allow us at some future point to enforce that guest code is execute-only, which is interesting from the PoV of improving defences against ROP attacks.

It would be two calls though, one "regular" mprotect and one "ept-mprotect", so there would be a short time when the maps might be out of sync.

This is not an issue for us, Solo5 imposes a "static" memory layout so we only need to set up the mappings once at initialisation time and never touch them again.

mato commented 5 years ago

@adamsteen Moving the SSP discussion to #293 which I've re-opened.

mato commented 5 years ago

@hannesm I've started a thread on freebsd-virtualization@ about this, archives here: https://marc.info/?l=freebsd-virtualization&m=154470008622609&w=2

adamsteen commented 5 years ago

With Mike Larkin's description here , i should be able to implement this over the break, i will report back when its done

mato commented 5 years ago

@adamsteen Thanks, keep me up to date on how it goes. For the purposes of testing, you can use the test_nox at https://github.com/mato/solo5/tree/enforce-nox, and temporarily hack in the call to your vmm ioctl directly in hvt_elf_load().

In the mean time, I will address the other sub-points in this issue, and also work out some scaffolding to be able to call a hvt_mprotect() or similar from the ELF loader.

adamsteen commented 5 years ago

@mato I will continue to investigate this issue #303 and #293 as i have time, life is very busy at the moment, but i should figure it all out in the end.

hannesm commented 5 years ago

@mato looking back into this, your enforce-nox branch (including the test) disappeared from mato/solo5 -- would be great to have the test in solo5 master imho (and maybe mark as failing on FreeBSD/OpenBSD atm)

adamsteen commented 5 years ago

This would be greatly appreciated, I have a patch half done in the works.

hannesm commented 5 years ago

looking back here, I'm wondering why hvt_init allocates (&mmaps) a single memory segment (with RWX protection)? an alternative I can think of (after talking wth some FreeBSD folks) would be to have for each ELF segment one vmm memory segment with the corresponding protection bits (the VM_MMAP_MEMSEG ioctl / int vm_mmap_memseg get prot as argument). This would require some coordination (a function provided by the platform-specific code) used in hvt_elf_load.

mato commented 5 years ago

@hannesm Sorry about the dissapearance of that branch. Take a look at enforce-nox-v2 which I just pushed, this only includes the tests but does not hook them up yet.

Interestingly, things are a bit more subtle on KVM than I thought, on this branch I added test_wnox that complements test_xnow in the other direction, i.e. tests that .data is not executable, and it turns out that KVM on my system is not enforcing this. So some more investigation is needed here.

looking back here, I'm wondering why hvt_init allocates (&mmaps) a single memory segment (with RWX protection)? an alternative I can think of (after talking wth some FreeBSD folks) would be to have for each ELF segment one vmm memory segment with the corresponding protection bits (the VM_MMAP_MEMSEG ioctl / int vm_mmap_memseg get prot as argument).

The reason there's a single segment is that when I originally wrote the FreeBSD implementation I assumed things work the same way as under KVM, i.e. MEMSEG is the equivalent of a KVM memslot, and host-side mprotect() would be used to enforce the additional protections. At some later point I did try using multiple MEMSEGs but could not get it to work -- I've forgotten the details now, but it seemed that MEMSEGs were not intended for "fine-grained" protection boundaries such as this.

Regarding coordination with the ELF loader, my rough plan was to do something like this:

typedef void (*hvt_guest_mprotect_fn_t)(void *addr, size_t size, int prot);
void hvt_elf_load( const char *file, uint8_t *mem, size_t mem_size,
    hvt_guest_mprotect_fn_t guest_mprotect_fn,
    hvt_gpa_t *p_entry, hvt_gpa_t *p_end);

... and then have the OS-specific backend implement a hvt_guest_mprotect(...), pass it to hvt_elf_load() when called and replace/augment the mprotect() call in the loader with a call to the backend-specific function passed in by the caller.

@adamsteen Would the above work for you?

adamsteen commented 5 years ago

@mato Yeah that should work for me!

mato commented 5 years ago

@hannesm I've merged the W^X tests (and enabled some combinations) in #363.

Regarding the KVM behaviour, it turns out that KVM does not support marking pages as NX from the host via mprotect(), so W^X is also only "partially" possible there. (Source: mailing list thread).

adamsteen commented 4 years ago

@mato with this commit, OpenBSD can support this correctly and completely.

This will be available in the next release of OpenBSD 6.7, which has just gone into beta, so in about a month.

ps OpenBSD from 6.6 has a really nice update feature, sysupgrade(8), making updating releases as simple as doas sysupgrade

adamsteen commented 4 years ago

@mato Looks like we can remove the OpenBSD label from this one now, after #447 being merged.

mato commented 4 years ago

Done. (Note to self: Update the status on this issue to make it clear where we're at with the various hvt/host combinations)