Jip-Hop / jailmaker

Persistent Linux 'jails' on TrueNAS SCALE to install software (docker-compose, portainer, podman, etc.) with full access to all files via bind mounts thanks to systemd-nspawn!
GNU Lesser General Public License v3.0
313 stars 31 forks source link

Jail capabilities for docker are dangerous for host networking #119

Closed templehasfallen closed 1 month ago

templehasfallen commented 1 month ago

I noticed a severe issue with the current "docker_compatible" flag.

From what I see, --capabilities=all is passed to systemd-nspawn which is reckless on many levels and poses various problems and security risks.

This does not happen when docker_compatible=0 as the jail does not have CAP_NET_ADMIN and cannot access host firewall.

As an example, my whole host iptables was wrecked by the hands of the jail. The combination of CAP_NET_ADMIN and host-networking is very dangerous and should at least include a warning if not being disallowed. The result was complete connectivity loss from all clients to the TrueNAS server, which can happen in many scenarios such as:

To be completely clear, I ended up with a bunch of iptables rules which were added inside a jail, on the host.

Steps to reproduce:

  1. Create jail
  2. Make jail docker compatible
  3. Apply any kind of firewall rules inside jail. ex. iptables -A INPUT -s 1.2.3.4 -j DROP
  4. View rule on host iptables -L | grep 1.2.3.4

Resolution Proposals:

  1. Only add the capabilities for docker to the jail and warn the user if using host networking that it can be destructive to host networking
  2. Do not allow host networking when using docker_compatible, insist on macvlan or bridge networking
  3. No.2 and drop all capabilities for the jail as they should not be required

Docker is able to be installed and run without any capabilities if using macvlan or bridge interfaces and setting --setenv=SYSTEMD_SECCOMP=0

Jip-Hop commented 1 month ago

Sorry to hear you ran into issues while using jailmaker. Thanks for reporting though!

You're right, this area needs improvement. I plan to remove the docker_compatible option in the future. The new docker config template already shows how to setup a jail for docker usage without --capabilities=all.

To provide some context, jailmaker evolved from my workaround to run docker on the TrueNAS host directly (when the docker binaries were still included on the base system). This is equivalent to running docker inside a jail with host networking and --capabilities=all.

I never ran into the issues you describe, even when I was running docker inside a jail with docker_compatible=0 and using host networking. Nevertheless there's the potential to wreck the host from inside the jail. That's why I've added the security statement and I suppose that's why iX Systems put this warning in the Sandboxes docs:

There is significant risk that using Jailmaker causes conflicts with the built-in Apps framework within SCALE. Do not mix the two features unless you are capable of self-supporting and resolving any issues caused by using this solution.

By the way, instead of disabling seccomp completely:

Docker is able to be installed and run without any capabilities if using macvlan or bridge interfaces and setting --setenv=SYSTEMD_SECCOMP=0

You could also add: --system-call-filter='add_key keyctl bpf'

Out of curiosity, what exactly did you do which caused:

my whole host iptables was wrecked by the hands of the jail.

This may be a good example scenario to put in a warning.

Were you not aware of the fact that the jail is using host networking by default?

templehasfallen commented 1 month ago

Sorry to hear you ran into issues while using jailmaker. Thanks for reporting though!

You're right, this area needs improvement. I plan to remove the docker_compatible option in the future. The new docker config template already shows how to setup a jail for docker usage without --capabilities=all.

To provide some context, jailmaker evolved from my workaround to run docker on the TrueNAS host directly (when the docker binaries were still included on the base system). This is equivalent to running docker inside a jail with host networking and --capabilities=all.

Thanks, I'm aware of all this, I actually read through virtually every single line in this repo already

I never ran into the issues you describe, even when I was running docker inside a jail with docker_compatible=0 and using host networking.

I assume you mean docker_compatible=1 here. When using that in combination with host networking, the host network interfaces are widely exposed and anything can interfere with bridges, apps, libvirt vm routing etc.

Nevertheless there's the potential to wreck the host from inside the jail. That's why I've added the security statement and I suppose that's why iX Systems put this warning in the Sandboxes docs:

There is significant risk that using Jailmaker causes conflicts with the built-in Apps framework within SCALE. Do not mix the two features unless you are capable of self-supporting and resolving any issues caused by using this solution.

This sadly affects everything, including bridges on the host, VMs etc, not only apps.

By the way, instead of disabling seccomp completely:

Docker is able to be installed and run without any capabilities if using macvlan or bridge interfaces and setting --setenv=SYSTEMD_SECCOMP=0

You could also add: --system-call-filter='add_key keyctl bpf'

Thank you, I will test it out.

Out of curiosity, what exactly did you do which caused:

my whole host iptables was wrecked by the hands of the jail.

Basically I ran a jail with host networking and docker_compatible=1 and installed a couple programs that enable and add firewall rules. Those rules were added directly to the iptables of the host, basically denying everything but ssh connections and the app itself. Suddenly not even the TrueNAS WebUI worked.

This may be a good example scenario to put in a warning.

Were you not aware of the fact that the jail is using host networking by default?

I was aware that it was using host networking, I was unaware that literally all capabilities were enabled in the jail - I didn't expect it and I'm sure others won't expect it either. My point is there should be a fair warning when combining host networking and CAP_NET_ADMIN or --capability=all. Realistically --capability=all is not required at all, and if you want to use host networking and docker, add the specific required capabilities and not all of them (CAP_NET_ADMIN, CAP_NET_BIND_SERVICE, CAP_NET_RAW, etc).

Jip-Hop commented 1 month ago

I assume you mean docker_compatible=1

Yes that's what I meant.

When creating a jail with host networking for the purpose of running docker, then docker needs to be able to create firewall rules in the host networking namespace. So the jail does need CAP_NET_ADMIN in this case.

It's no longer is the recommended way of running docker in a jail though so as a first step it would be a good idea to remove the docker_compatible setup question from the interactive create process and refer users to the docker config template instead. What do you think?

templehasfallen commented 1 month ago

I assume you mean docker_compatible=1

Yes that's what I meant.

When creating a jail with host networking for the purpose of running docker, then docker needs to be able to create firewall rules in the host networking namespace. So the jail does need CAP_NET_ADMIN in this case.

It's no longer is the recommended way of running docker in a jail though so as a first step it would be a good idea to remove the docker_compatible setup question from the interactive create process and refer users to the docker config template instead. What do you think?

Yeah, exactly, I absolutely agree.

Also, if you disallow the combination of host networking and CAP_NET_ADMIN, you could go as far as considering jailmaker safe.

Jip-Hop commented 1 month ago

@templehasfallen could you review/test https://github.com/Jip-Hop/jailmaker/pull/121?

templehasfallen commented 1 month ago

@templehasfallen could you review/test #121?

Hey, just tested on both 23.10.2 and 24.04-RC1 without any issues. All of the functionality seems to work both using a template and manually for a docker compatible jail, including running containers and exposing them.

Thanks a lot and great work :)

Jip-Hop commented 1 month ago

Thank you!

mrstux commented 1 month ago

I never hit this as I immediately used the docker template with bridge networking, to simplify replacing my docker vm