/etc/netns/<namespace-name> are not actually used

JeWe37 commented 2 years ago

Well, this is sort of a critical bug that leads to failures of DNS resolution(luckily due to the isolated nature of namespaces, it does not actually leak anything).

So basically, it turns out that "nsswitch.conf" and "resolv.conf" are only really used by iproute2's ip-netns, but aren't actually all that directly connected to the namespace itself. Instead ip-netns manually creates a mount namespace and mounts the files there, see here. That means when only joining the namespace of an existing systemd process, these rules are not honored.

Now, it would be possible to emulate this behavior by creating a mount namespace for ip-netns where correct mounts are applied. However the issue is that JoinsNamespaceOf explicitly does not support mount namespaces.

So basically that leaves only one option: Drop the systemd "JoinsNamespaceOf" entirely and just revert back to using simple ip netns exec. Given that DNS resolution breaks horribly without this, I would consider this necessary. This also allows getting rid of some of the less pretty hacks with remounting the network namespace.

chrisbouchard commented 2 years ago

Maybe we can do something with NetworkNamespacePath= (added in 242, I recently learned) plus a bind-mount for /etc/resolv.conf via BindPaths=.

[Service]
NetworkNamespacePath=/run/netns/vpn
BindPaths=/etc/netns/vpn/resolv.conf:/etc/resolv.conf

That at least avoids needing ip netns exec, which I'd like to avoid if possible because that makes it hard to modify provided service definitions via drop-in configuration. As far as I know, the only way to add a prefix would be to replace the entire Exec line, plus any PreExec or PostExec lines — we'd basically be providing an alternate service definition at that point.

(This does require knowing what in /etc/netns/vpn needs to be mounted, which is definitely not as nice as what ip netns exec does. But I think it's reasonable to assume that the file list is relatively stable, even if their contents is not.)

chrisbouchard commented 2 years ago

All that said, I've been setting up a new server, and I'm considering dropping this configuration for something built around wg-netns. It even provides a templated systemd service to build namespaces — sort of what I was trying to describe in this comment.

I think this would simplify a lot of things. For instance, we could potentially implement onion routing using something like the following systemd units (assuming wg-netns configurations for vpn1, vpn2, and vpn3 exist):

`vpn.target`

[Unit]
Description=Network Setup for VPN
Requires=wg-netns@vpn1.service wg-netns@vpn2.service wg-netns@vpn3.service

`wg-netns@vpn2.service.d/00-binds-to-netns-vpn1.conf`

[Unit]
After=wg-netns@vpn1.service
BindsTo=wg-netns@vpn1.service

[Service]
NetworkNamespacePath=/run/netns/vpn1
BindPaths=/etc/netns/vpn1/resolv.conf:/etc/resolv.conf

`wg-netns@vpn3.service.d/00-binds-to-netns-vpn2.conf`

[Unit]
After=wg-netns@vpn2.service
BindsTo=wg-netns@vpn2.service

[Service]
NetworkNamespacePath=/run/netns/vpn2
BindPaths=/etc/netns/vpn2/resolv.conf:/etc/resolv.conf

Then systemd would be responsible for managing the order of things. It would start wg-netns in the previous-created network namespace, which would create the new namespace. Then other services could require vpn.target.

The other nice thing about this is that the individual namespace and interface definitions are order-independent. You can rearrange, add, or remove namespaces just by updating the systemd dependencies.

JeWe37 commented 2 years ago

I definitely think NetworkNamespacePath=is certainly the more proper way of using network namespaces than manually remounting them.

Only issue is that the drop ins would also be required to retain the list of bind mounts, which somewhat gets us back to where we were. And that aside ip netns exec behavior is that it bind mounts every file inside /etc/netns/<name> to /etc/<name> individually, which cannot be perfectly emulated with the bind mount directive. At that point one could also require a private mount namespace and simply using a script to mount the required files, which would also allow perfectly emulating the behavior of ip netns exec.

On the other hand though I don't think we need the general case anyway most likely. nsswitch.conf and resolv.conf should be all we need.

About the idea with wg-netns, if that works the way you describe that would definitely be much neater in general. Internally it uses ip netns exec so you wouldn't actually even need the bind mounts, only on the units that join the namespace(it's missing nsswitch.conf though, I suspect that might lead to issues on systems with systemd-resolvd). I'd prefer keeping the veth tunnel and routing via iptables if required rather than socat hacks, but that's optional anyway.

For now I'm running my current setup pretty happily, I might take a look at this again in a few weeks though.

chrisbouchard commented 2 years ago

I'm glad you've got something that's working for you. :+1:

existentialtype commented 1 year ago

After trying a few different approaches I found a setup I like, which as it turns out uses exactly the settings you discussed in this thread, wg-netns, NetworkNamespacePath, and a bind mount. I wrote up my set up at https://github.com/existentialtype/deluge-namespaced-wireguard in case you find it helpful. I'm also using systemd-socket-proxyd instead of either veth tunnel or socat forwarder.

JeWe37 commented 1 year ago

I've been working on something similar(integrated solution for deploying all this into a flatcar VM via Ansible), just using rootless podman to deploy the services which plays much more nicely than docker with custom namespaces. I wasn't familiar with the systemd-socket-proxy, I've been NATting stuff that needs to go into the namespace with veth tunnels and iptables. This is probably a simpler option, even if it requires an additional process.

existentialtype commented 1 year ago

Exactly. I think systemd-socket-proxyd is a good balance between simplicity and runtime processes. The veth tunnel has the cleanest runtime with no extra processes, but is more complicated to set up than the other two approaches. On the other side, socat is probably the easiest to set up, but when connections are open it runs at least two processes, one in the root namespace and one in the protected namespace. And since it uses a forking model it can spawn even more processes if there are multiple connections. systemd-socket-proxyd is in the middle. It runs at most 1 process, in the protected namespace, since systemd itself takes of the socket in the root namespace. And since it uses an event worker model, it runs the single process for up to hundreds of connections. And with socket activation and idle termination it only runs the process if there are active connections. But it does require two systemd unit files to set up.

chrisbouchard / namespaced-wireguard-vpn