OpenVPN / ovpn-dco

OpenVPN Data Channel Offload in the linux kernel
106 stars 26 forks source link

Implement in-tree build for DCO with proto fix #49

Open sempervictus opened 1 year ago

sempervictus commented 1 year ago

The initialization hack for ovpn_tcp_prot doesn't work when the tcp_prot struct is constified (by Grsec/PaX, for instance) and is generally unsafe as it performs runtime function hooking/hijack to achieve its goal.

The correct way to set up the structure requires access to symbols not exported by the kernel's headers, which makes out-of-tree compilation with the appropriate initializer impossible. In-tree builds can also benefit from LTO and other toolchain optimizations, as well as become part of monolithic kernels which do not use modules such as long-running network devices.

Notes: Built and run using GCC 13 for Grsecurity/PaX (stable8) with all features except PRIVKSTACK and KERNSEAL enabled - x86_64 only. Built using LLVM17 with LTO & kCFI on GrapheneOS' linux-hardened patchset with their default features enabled - x86_64 only.

sempervictus commented 1 year ago

This is intended, among other things, to help harden VyOS who have recently adopted the DCO code and actually do have long-running use cases in which having malleability in critical kernel structures is "not great."

ordex commented 1 year ago

Hey thank you for working on this. This said, do you think this change truly belongs to the ovpn-dco repository? To me it looks like this is more of an integration patch, which should be hosted by whoever decides to put together ovpn-dco and a specific kernel source.

However, I get the point of tcp_prot not being a splendid approach (and I'd like to find a better solution), but I am not sure your approach would be accepted upstream either. What do you think?

sempervictus commented 1 year ago

Thanks for pinging back @ordex. I've done a number of these patches for other out of tree efforts, and generally speaking the motivations are all the same:

  1. Give the compiler full visibility into the code it is building so it can optimize, strip, and trim away things that are not needed (attack surface reduction/optimization)
  2. Provide the visibility of part 1 to the CFI implementation to appropriately mark (and group if applicable) call/return targets. kCFI is forward-only, but someday...
  3. Ensure clean ABI between the various components - mismatched ABI in ring0 is a rough and opaque attack surface/defensive zone.

Far as the struct fix goes - that seems to me to be the correct way of doing it, the original code is hijacking a separate struct at runtime via the __init madness which is a clever way to work around the problem of restricted exports, but probably no way to do things in production. We can make that patch optional for the in-tree process if you'd like, but i figure that if consumers will build in-tree then they might as well benefit from the correct struct definition and exports from said tree.

Lastly, in terms of putting this work in consumer projects - IMO, thats asking for fragmentation and maintenance burden. Keeping it here (until its actually upstreamed, if thats the plan) allows us to track drift and fix it in one place.