Per-container outbound firewall rules

tangentsoft commented 11 months ago

In the Docker model of containers, you declare allowed inbound ports with EXPOSE rules in the Dockerfile, and Podman inherits this in its Containerfile. I believe this is implemented by Netavark at bottom, which is why I am asking for an extension of the feature here.

My core concern is that a lot of containers have more power inside them than we ideally want; fewer follow the microservice ideal than we would like. For one huge example, anything based on Alpine will have a shell, a capable package manager, and a stripped-down wget command built into Busybox, allowing it to pull in almost any external code and run it inside the container, if allowed.

Rather than go to all of the effort of rebuilding these risky containers to rid them of the Busybox and APK stuff, making them more microservice-like, I'd like to be able to say, "This container can only make outbound TCP connections on port 12345." Problem solved.

When the desired behavior is instead to block all outbound networking, we have that already via the --internal flag. The tricky bit that I think needs addressing is when a given container has legitimate need to connect outbound, but only within carefully-scoped rules expressible in the firewall-cmd language. To take the Alpine example once more, outbound connections to port 80 and 443 to pull in additional APK packages and such may be clearly bogus, but if the container is a mail server, it does need to connect out on ports 25, 465, 587, etc.

But now there's a new problem, which is why I am here writing this. How do I express this in firewall-cmd language out on the host when I don't know the container's source IP or MAC, and I don't want the rules to affect the host itself? These per-container identifiers keep changing on each launch! If I block 80 and 443 on the host, I can't pull new container images, OS updates, etc.

What I think I want — and feel free to tell me what I actually want 😛 — is a way to say, "When this container comes up, apply these firewall rules to it, intelligently filling in the bits that change on each container instantiation." When the container stops, drop those automatic firewalld rules.

Luap99 commented 10 months ago

I would consider this out of scope (at least at the moment) for netavark. The complexity of integrating some form of firewall syntax is quite high and given we need to support this for firewalld, iptables, nft.

It is also not clear where should we add the rules on the host netns or the container netns. Both ways would have thier own problems. Note you can write your own plugin to do this: https://github.com/containers/netavark/blob/main/plugin-API.md

karuboniru commented 10 months ago

If running podman containers in a systemd unit, things like SocketBindAllow=, SocketBindDeny= should work when specifying --cgroups=split. As systemd implements the simple firewall as some bpf filter in cguoup hierarchy. This can be a workaround and potenially stonger as this also this applies to macvlan/ipvlan network where host firewall usually don't handle.

tangentsoft commented 10 months ago

@karuboniru: Thanks, that should be helpful once all my EL8 boxes age out, since they'll be shipping systemd 239 until the end of time, and that feature wasn't added until 249.

Does this work for user services, or does it require that the service is started by root, thus pushing me into the rootful Podman mode?

karuboniru commented 10 months ago

I believe it requires that the unit is in system instance (i.e. started with root). But even starting container from root don't mean the container must run in rootful mode.

From my practice, I am setting

# /etc/containers/containers.conf
[containers]
userns = "auto"

to tell podman to assign user namespace by default even when started with root, and assigned subuid for root in /etc/sub{u,g}id (you can use any name here, e.g. container, but the name must match the name used in root-auto-userns-user)

root:2147483647:2147483648

and changed storage.conf to tell podman assign uidmap from the range specified in sub{u,g}id file

[storage.options]
root-auto-userns-user = "root"

Benefit of this is that as container are started as root, I get things like rootful network stack such as routable bridge and macvlan.

And payload of container are still in rootless environment

$ sudo podman run -it --rm --network none alpine:latest cat /proc/self/uid_map
         0 2147692255        405

tangentsoft commented 5 months ago

Since the announcement of Podman 5 and its use of pasta in rootless containers. I believe I will have what I need when I can get off Podman 4. Specifically, it looks like pasta -T does what I want.

I'm closing this because the wish is substantially fulfilled. If there are any remaining bits to do on the Podman side — how do you ask it to pass -T to pasta is my big question — it isn't on-topic here in the netavark repo.

Luap99 commented 5 months ago

Since the announcement of Podman 5 and its use of pasta in rootless containers. I believe I will have what I need when I can get off Podman 4. Specifically, it looks like pasta -T does what I want.

I'm closing this because the wish is substantially fulfilled. If there are any remaining bits to do on the Podman side — how do you ask it to pass -T to pasta is my big question — it isn't on-topic here in the netavark repo.

I don't think that does what you think it does, in particular I don't know of any option with pasta to block outbound.

The -T option is used to forward ports from the container namespace to the host. IT doesn't effect any outbound ip connections AFAIK.

If you want to pass pasta cli options use something like --network pasta:-T,80 as documented for the --network option. https://docs.podman.io/en/latest/markdown/podman-create.1.html#network-mode-net

containers / netavark

Per-container outbound firewall rules #875