NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.32k stars 14.3k forks source link

Networking between docker containers fails unless ~~firewall~~ checkReversePath is disabled #298165

Open Ralith opened 8 months ago

Ralith commented 8 months ago

Describe the bug

Docker containers don't seem to be able to communicate amongst themselves unless networking.firewall.enable = false is set, which is not desirable for obvious reasons. Setting networking.firewall.trustedInterfaces = [ "docker0" ]; is not sufficient.

Steps To Reproduce

Steps to reproduce the behavior:

  1. Clone e.g. https://github.com/quic-interop/quic-interop-runner
  2. Run any test, e.g. python3 run.py -d -s quic-go -c quic-go -t handshake
  3. Note errors in output: client | Downloading files failed: timeout: no recent network activity, Test: handshake took 18.309553s, status: TestResult.FAILED
  4. Repeat after switching to networking.firewall.enable = false. Note test success.

Expected behavior

Containers should be able to communicate with each other, allowing tests to pass.

Notify maintainers

@offlinehacker @vdemeester @periklis @amaxine

Metadata

 - system: `"x86_64-linux"`
 - host os: `Linux 6.1.63, NixOS, 23.11 (Tapir), 23.11.750.7c4c20509c43`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.18.1`
 - channels(root): `"nixos-23.11"`
 - channels(ralith): `""`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`
SuperSandro2000 commented 8 months ago

Can you reproduce this problem only with docker run command and a base image and without a third party repo? What is the minimal reproducer?

Ralith commented 8 months ago

I've never had any luck trying to drive docker by hand, despite a few attempts. nc -l -u 0.0.0.0 1234 on one container, and nc -u <ip> 1234 on the other, then sending a few lines of input (i.e. attempting to exchange UDP packets) should be plenty to test, but cloning the cited repo is probably easier.

SuperSandro2000 commented 8 months ago

Did you try tge example you have with nc? Can you reproduce the issue with that?

Ralith commented 8 months ago

How would I do that? I have no experience with docker, I'm just trying to use this software that works fine elsewhere.

squat commented 6 months ago

Hi, from my tests, the issue is not that container-to-container communication is broken, but rather that the firewall drops forwarded packets. This causes problems with kind (e.g. https://github.com/kubernetes-sigs/kind/issues/3443) and other projects that that expect containers to act as gateways.

Here's a complete reproduction you can run in a single terminal:

# Setup a new Docker network so that we can resolve the container name to an IP address.
docker network create nixos-test
# Create a container named `one` and configure it to response to the additional IP address 10.5.0.1.
docker run --rm -d --net nixos-test --cap-add NET_ADMIN --name one alpine sh -c 'ip a add dev eth0 10.5.0.1; tail -f /dev/null'
# Create a container named `two`, configure it to route packets to 10.5.0.1 via `one`, and ping 10.5.0.1.
docker run --rm -it --name two --net nixos-test --cap-add NET_ADMIN alpine sh -c 'ip route add 10.5.0.1 via $(nslookup one | tail -n 2 | head -n1 | cut -f2 -d" "); ping 10.5.0.1'

If the NixOS firewall is enabled, then ping will show no output. If you leave the command running and disable the firewall, then ping will produce regular output.

To cleanup the test:

docker kill one two
docker network rm nixos-test
ahbnr commented 4 months ago

Hi, from my tests, the issue is not that container-to-container communication is broken, but rather that the firewall drops forwarded packets. This causes problems with kind (e.g. kubernetes-sigs/kind#3443) and other projects that that expect containers to act as gateways.

The situation described by squat looks quite similar to a problem I recently encountered. So perhaps the following information is useful to someone since it helped me quite a bit:

Instead of completely disabling the firewall, it might be sufficient to disable networking.firewall.checkReversePath. Alternatively, one can add specific exceptions to the chain nixos-fw-rpfilter in the mangle table, see also https://unix.stackexchange.com/a/780036/625340

Ralith commented 4 months ago

it might be sufficient to disable networking.firewall.checkReversePath

I can confirm this got things working for me. Thanks! Is this default different from other major distros? Should we change it?

SuperSandro2000 commented 3 months ago

How about the docker module disabled checkReversePath?

hellwolf commented 3 months ago

I deleted my message. Actually, it was a red herring in my case.

Ralith commented 3 months ago

How about the docker module disabled checkReversePath?

That seems sensible, but I don't fully understand why this is enabled by default in the first place. Someone with more firewall experience should probably weigh in.