NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.05k stars 14.09k forks source link

Docker + Kubernetes = no default bridge, commands fail #71040

Open jamescostian opened 5 years ago

jamescostian commented 5 years ago

Describe the bug I set up k8s and docker, and noticed that when running in a docker container, I could not connect to the internet. If I set --net=host for docker run or --network=host for docker build, then I can access the internet within containers. When I disabled k8s, accessing the internet from docker worked without any extra configuration. It is worth noting that the docker0 interface appears in ifconfig when k8s is disabled, but not when k8s is enabled, however, this appears to be on purpose.

To Reproduce In configuration.nix you need to have docker and k8s:

  services.kubernetes = {
    roles = [ "master" "node" ];
    masterAddress = "localhost";
  };
  virtualisation.docker.enable = true;

Then nixos-rebuild switch and run sudo docker run -it --rm debian /usr/bin/ping 1.1.1.1 and you will see connect: Network is unreachable

Expected behavior I expect kubernetes and docker to both work when they are both enabled

Additional context I tried export DOCKER_OPTS="--net=host" but it didn't fix things. I also looked through all the relevant GitHub issues and the online manual, none helped. Google wasn't helpful either. The best I could find was a slide shown in a youtube video, which seemed to acknowledge the lack of a bridge interface for docker. Its config was outdated, so I tried updating it like so:

  services.kubernetes = {
    roles = [ "master" "node" ];
    masterAddress = "localhost";
  };
  networking.bridges = {
    cbr0.interfaces = [];
  };
  networking.interfaces = {
    cbr0.ipv4.addresses = [ {
      address = "192.168.86.64";
      prefixLength = 24;
    } ];
  };
  virtualisation.docker.enable = true;
  virtualisation.docker.extraOptions = "--iptables=false --ip-masq=false -b cbr0";

Unfortunately, this did not fix docker (in fact, it broke my normal internet connection until I rolled back)

Metadata

 - system: `"x86_64-linux"`
 - host os: `Linux 4.19.78, NixOS, 19.03.173582.df7e351af91 (Koi)`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.2.2`
 - channels(james): `""`
 - channels(root): `"nixos-19.03.173582.df7e351af91"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`
jamescostian commented 5 years ago

Just read the options part of the manual more closely, and fixed one part of my configuration:

   networking.bridges = {
-    cbr0.interfaces = [];
+    cbr0 = {
+      interfaces = [ "wlo0" ];
+    };
   };

That allows my normal internet access to work, but prevents docker and kubernetes from starting. Here are my journalctl -xe logs.

stale[bot] commented 4 years ago

Thank you for your contributions. This has been automatically marked as stale because it has had no activity for 180 days. If this is still important to you, we ask that you leave a comment below. Your comment can be as simple as "still important to me". This lets people see that at least one person still cares about this. Someone will have to do this at most twice a year if there is no other activity. Here are suggestions that might help resolve this more quickly:

  1. Search for maintainers and people that previously touched the related code and @ mention them in a comment.
  2. Ask on the NixOS Discourse. 3. Ask on the #nixos channel on irc.freenode.net.
schneefux commented 4 years ago

still relevant

mikroskeem commented 4 years ago

Any updates on this?

Using --net=host is not a proper solution at all, the root cause should be figured out instead. Enabling k8s service breaks networking completely inside Docker and containers set up by k8s can't access the internet either. Edit: Turns out flannel broke before, making k8s containers' network nonfunctional.

I've tried with firewall on/off, ensured that net.ipv4.ip_forward and net.ipv4.conf.<intf>.forwarding are set to 1.

This happens in 20.03

 - system: `"x86_64-linux"`
 - host os: `Linux 5.4.47-xanmod1, NixOS, 20.03.2310.fb6c3a6831c (Markhor)`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.3.6`
 - channels(root): `"nixos-20.03.2310.fb6c3a6831c, nixpkgs-unstable-20.09pre228384.c27e54de99d"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`

... and unstable

 - system: `"x86_64-linux"`
 - host os: `Linux 5.4.47, NixOS, 20.09pre231796.22a81aa5fc1 (Nightingale)`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.3.6`
 - channels(root): `"nixos-20.09pre231796.22a81aa5fc1"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`
stale[bot] commented 3 years ago

I marked this as stale due to inactivity. → More info

aaronjanse commented 3 years ago

Still important to me

mikroskeem commented 3 years ago

One solution is to make Kubernetes use CRI-O instead of Docker. However, it's not very straightforward, but that's what I ended up doing.

Might try finding time to clean up my Nix files...

aaronjanse commented 3 years ago

Huh, in that case I'll say this is related to https://github.com/NixOS/nixpkgs/pull/96084

aaronjanse commented 3 years ago

@mikroskeem When possible, would you mind sharing the relevant part of your config, even if messy? I've got CRI-O working via the PR linked above but still no networking :-/

mikroskeem commented 3 years ago

There you go - https://gist.github.com/mikroskeem/683926a55c5b65d9343b9397ccc09afa.

Here are few notes and warnings: 1) I copied the files from nixpkgs and made changes directly - pick last commit before 1 Jul 2020 and run diff; There might be some comments beginning with zentria:, search for them too. 2) Anything what touches networking/firewall probably bricks flannel. Solution: reboot. 3) You must reset your existing k8s setup - don't also forget etcd and so on. Do not bother draining the node. 4) Something in the networking part is broken, at some point 50% of the connections just time out (over a week of uptime or so - pretty much every Monday debugging/headache/dumpster fire/reboot cycle)

In conclusion: if something breaks, good old "have you tried turning it off and on again" practice applies.

aaronjanse commented 3 years ago

Thank you @mikroskeem!

stale[bot] commented 3 years ago

I marked this as stale due to inactivity. → More info