docker / for-linux

Docker Engine for Linux
https://docs.docker.com/engine/installation/
754 stars 85 forks source link

Defining networks in docker - An legend to behold #221

Open joeyhub opened 6 years ago

joeyhub commented 6 years ago

I would like to have fine grained network filtering in docker. This means that by default everything is blocked and then you allow what you would like to allow. From my perspective this should have been how docker works by default. Containment is containment and should start with access to nothing then poking specific holes. Similar to volumes and other facilities. However networking in docker is not set up like this. Instead it's all a bit random.

Using my own wrappers for docker and docker-compose, I have the following options for a container:

service:
    // Usually this will only allow access from the host. However if the host is a router for the bridge rather then external access will be granted as well. It is not expected for docker to be responsible for filtering specific external access in my use case.
    allow.netfilter from=external protocol=tcp port=?
    allow.netfilter from=external protocol=udp port=?
    // This is for publishing the ports, it's largely the same as above but for when a static port is needed (such as for providing DNS from a container to the host). It's very rarely needed as once dyn-dns is exposed for the containers then that is subsequently used. I believe docker can update the hosts file on the host but that's not so easy to make available externally, filter or to give a specific zone/TLD to.
    allow.netfilter protocol=tcp port=? publish=?
    allow.netfilter protocol=udp port=? publish=?
    // Gives specific network access to another container.
    allow.netfilter to=service.name protocol=udp port=?
    allow.netfilter to=service.name protocol=tcp port=?
    // This allows the container specific external access to an external endpoint.
    allow.netfilter to=external.ipv4.address protocol=tcp port=?
    allow.netfilter to=external.ipv4.address protocol=udp port=?
    // This allows use of a specific protocol to the outside world where the IP address is dynamic. Other systems may be used to provide tighter restrictions such as DNS filtering.
    allow.netfilter to=external protocol=tcp port=?
    allow.netfilter to=external protocol=udp port=?
    // Allow A name lookups on a domain and all of its subdomains.
    allow.dns zone=?
    // Allow A name lookups on a domain.
    allow.dns domain=?
    // Reverse publishes (or exposes) a port internally to the bridge, essentially port forwarding used for services running tunnels, masquerades. A problem with this one since the tunnel wont go through the external chain, needs to be applied inside the container if you also want restrictions on the inner container to apply to its traffic to the tunnel.
    allow.netfilter to=external.ipv4.address protocol=tcp port=? publish=?
    allow.netfilter to=external.ipv4.address protocol=udp port=? publish=?

I might have other restrictions in place through reverse proxies, known hosts, etc. You can imagine all kinds of other variations but this is my current use case.

I currently achieve network containerisation through generating my own docker compose. I put everything on a bridge network with a fixed subnet. I don't make it internal as last time I tried this it broke a lot of things (I never had time to find out why).

I give every service a static IP address automatically. I also generate iptables automatically filling the DOCKER-USER chain which works for everything except for publishing ports, which works appropriately using docker's implementation (it manages to bi-pass the forward chain altogether).

I can't find any other way to have fine grained network firewalling for docker. We have things such as link, expose, publish, external_links, extra_hosts, etc. However the underlying functionality of these isn't clear or exposed by the name. It also occasionally shifts, stops being supported by wrappers such as docker-compose, etc. Custom network drivers don't appear to be trivial to implement and it appears most of them out there are specifically for multi-host virtual networks.

The only way for me to not have to rely on static IP is to create a service in a container that connects to and monitors the docker socket. At least in this scenario it shouldn't be too difficult for docker to internalise for when running on Linux. However it still requires modification of the host's iptables. It's a shame that you can't have a container act as an intermediate to manage all of the interfaces and iptables for a group of containers. If that is possible this raises questions about where the syslog will go for that container when using -j LOG.

DNS filtering is another example of something docker could manage as it appears to do some magic for intercepting DNS requests already (although things can creep to become more complex if it also wants to auto-firewall external things by the addresses resolved, rewriting one host for another would break things like SSL as well). At present if I need another docker container for filtering DNS then things become complex. I can't set the DNS server by hostname, it has to be static (same as if I want one for routing). Perhaps links might solve this however if it keeps the hosts file up to date.

I've looked into other methods such as setting the iptables for each container but this isn't elegant. To be unobtrusive, you need to create a bootstrap compose that runs the same image for each container, with NET_ADMIN that can set up the iptables and routing (if you want to route everything via one container for example). I'm not sure if all of that would persist between container runs and it further escalates the question of what should happen with -j LOG. It's particularly unkind for dynamic as well, unless your network setup is to route everything via a management container (although this needs to be static).

Hidden in this is a notion not well exposed by docker. You have your resident process that you run in your container (or one of process) but then you might only have a container that sets up the kernel to do stuff with the namespace. As far as I am aware, wont a container setup as a router still work even with not process as it's only using kernel calls? Will a docker bridges network allow an internal router, if not wont things like a SNAT/DNAT forward still work on a stopped container (since MAC/IP will be consistent with standard container to container behaviour)? There's also no separation of container setup exposed to the user. It's not possible to have an entrypoint or more of a bootstrap outside of a container able to setup internal namespaced kernel resources (particularly networking in this case) according to external configuration.

cpuguy83 commented 6 years ago

Docker networks, as in docker network create foo && docker run --network=foo provide this isolation. Things that share a network can communicate, things which do not cannot communicate. When you create a container that is not attached to a specific network, it is attached to the default "bridge" network.

There is a daemon option to disable inter-container communication (--icc=false). This option applies only to the default bridge network. It prevents containers on the network from communicating except when explicitly linked with --link, and even then they can only communicate over ports which are exposed (e.g. EXPOSE in a Dockerfile, -p, --expose, etc).

The default behavior of allowing communication on the bridge network is just for historical reasons and can't really be changed.

Beyond that, I think these types of filters could easily work in the proposal from #32801.

joeyhub commented 6 years ago

What I've put here is basically what I want to achieve with networking which is fine grained control (precise whitelisting). I've also given how I currently do this and some other ideas or intentions as food for thought.

32801 is something a bit different although it's relevant to some of the concepts such as using a contained as a router. That ticket is setting very high level roles. It is relevant if you want to use a container for a router or something such as that. However I don't see it unblocking. It offers top level roles that aggregate enable permissions from a broad range of subsystems (capabilities, syscalls, containment, etc). I don't believe it would be appropriate to have it worry about very low level bespoke security entitlements. There's already some complexity there as it is. For example, NET_ADMIN only gives a container management rights over its own namespaced network stack from what I can tell (which is good), otherwise you would want network=host. It related but I don't think it really solves a great deal. I haven't so far seen a great problem with top level permissions that would be solved by this. There is one thing I mentioned about running all your containers once first with a bootstrap but that just puts the capability NET_ADMIN. Using entitlements wouldn't really change anything about that except the vocabulary.

Some of the issues with containers having special roles in relation to managing docker or a group of containers isn't purely a permissions issue. It's more of sometimes docker not letting you expose information you need for this or for linkage on a higher level. That or docker not giving you features (either at all or simple versions that don't need people to pull in concerns they don't have) to begin with that it could and probably should leading you to try to almost make plugins for docker as docker containers. I am a bit confused by this when IPAM used to offer features that looked like that might help but they have at least disappeared from the latest docker-compose. Those are also problematic where you need to set things on a per network level meaning you end up with multiple networks where you only want one.

Docker should certainly be capable of managing firewall on an iptables and DNS level for example. It's already doing those two things to some extent, it simply doesn't provide any way to directly give it specifics to apply for filtering. I suppose it's quite easy until you ask what about Windows, Mac, etc. Regardless a lot of the epic saga I am describing is working around features that I think could be core docker. With that I only have a few complexities left remaining.

There is a crossroads as to what docker should do and what it should enable containers to be able to do. #32801 tends to apply more to the latter but it doesn't solve the problem of then passing linkage information like that service X is a gateway nor does it easily solve the dynamic IP issue.

I'm in the camp though of I should really not have to be digging into docker internals for basically accomplishing what you're supposed to be able to accomplish with a high level container system, containment with exact access rights (allow, deny) delivered in a largely declarative and unobtrusive manner.