Closed tangentsoft closed 5 months ago
I would consider this out of scope (at least at the moment) for netavark. The complexity of integrating some form of firewall syntax is quite high and given we need to support this for firewalld, iptables, nft.
It is also not clear where should we add the rules on the host netns or the container netns. Both ways would have thier own problems. Note you can write your own plugin to do this: https://github.com/containers/netavark/blob/main/plugin-API.md
If running podman containers in a systemd unit, things like SocketBindAllow=, SocketBindDeny= should work when specifying --cgroups=split
. As systemd implements the simple firewall as some bpf filter in cguoup hierarchy. This can be a workaround and potenially stonger as this also this applies to macvlan/ipvlan network where host firewall usually don't handle.
@karuboniru: Thanks, that should be helpful once all my EL8 boxes age out, since they'll be shipping systemd 239 until the end of time, and that feature wasn't added until 249.
Does this work for user services, or does it require that the service is started by root, thus pushing me into the rootful Podman mode?
I believe it requires that the unit is in system instance (i.e. started with root). But even starting container from root don't mean the container must run in rootful mode.
From my practice, I am setting
# /etc/containers/containers.conf
[containers]
userns = "auto"
to tell podman to assign user namespace by default even when started with root, and assigned subuid for root in /etc/sub{u,g}id (you can use any name here, e.g. container, but the name must match the name used in root-auto-userns-user
)
root:2147483647:2147483648
and changed storage.conf
to tell podman assign uidmap from the range specified in sub{u,g}id
file
[storage.options]
root-auto-userns-user = "root"
Benefit of this is that as container are started as root, I get things like rootful network stack such as routable bridge and macvlan.
And payload of container are still in rootless environment
$ sudo podman run -it --rm --network none alpine:latest cat /proc/self/uid_map
0 2147692255 405
Since the announcement of Podman 5 and its use of pasta
in rootless containers. I believe I will have what I need when I can get off Podman 4. Specifically, it looks like pasta -T
does what I want.
I'm closing this because the wish is substantially fulfilled. If there are any remaining bits to do on the Podman side — how do you ask it to pass -T
to pasta
is my big question — it isn't on-topic here in the netavark
repo.
Since the announcement of Podman 5 and its use of
pasta
in rootless containers. I believe I will have what I need when I can get off Podman 4. Specifically, it looks likepasta -T
does what I want.I'm closing this because the wish is substantially fulfilled. If there are any remaining bits to do on the Podman side — how do you ask it to pass
-T
topasta
is my big question — it isn't on-topic here in thenetavark
repo.
I don't think that does what you think it does, in particular I don't know of any option with pasta to block outbound.
The -T
option is used to forward ports from the container namespace to the host. IT doesn't effect any outbound ip connections AFAIK.
If you want to pass pasta cli options use something like --network pasta:-T,80
as documented for the --network option.
https://docs.podman.io/en/latest/markdown/podman-create.1.html#network-mode-net
In the Docker model of containers, you declare allowed inbound ports with
EXPOSE
rules in theDockerfile
, and Podman inherits this in itsContainerfile
. I believe this is implemented by Netavark at bottom, which is why I am asking for an extension of the feature here.My core concern is that a lot of containers have more power inside them than we ideally want; fewer follow the microservice ideal than we would like. For one huge example, anything based on Alpine will have a shell, a capable package manager, and a stripped-down
wget
command built into Busybox, allowing it to pull in almost any external code and run it inside the container, if allowed.Rather than go to all of the effort of rebuilding these risky containers to rid them of the Busybox and APK stuff, making them more microservice-like, I'd like to be able to say, "This container can only make outbound TCP connections on port 12345." Problem solved.
When the desired behavior is instead to block all outbound networking, we have that already via the
--internal
flag. The tricky bit that I think needs addressing is when a given container has legitimate need to connect outbound, but only within carefully-scoped rules expressible in thefirewall-cmd
language. To take the Alpine example once more, outbound connections to port 80 and 443 to pull in additional APK packages and such may be clearly bogus, but if the container is a mail server, it does need to connect out on ports 25, 465, 587, etc.But now there's a new problem, which is why I am here writing this. How do I express this in
firewall-cmd
language out on the host when I don't know the container's source IP or MAC, and I don't want the rules to affect the host itself? These per-container identifiers keep changing on each launch! If I block 80 and 443 on the host, I can't pull new container images, OS updates, etc.What I think I want — and feel free to tell me what I actually want 😛 — is a way to say, "When this container comes up, apply these firewall rules to it, intelligently filling in the bits that change on each container instantiation." When the container stops, drop those automatic
firewalld
rules.