containers / netavark

Container network stack
Apache License 2.0
536 stars 85 forks source link

network interface/firewall rules leaked after aardvark-dns start error #1121

Open stuartm opened 1 week ago

stuartm commented 1 week ago

Issue Description

Podman version 5.2.3

The issue I'm seeing is identical to containers/podman#14365 which was closed and locked due to inactivity, but it seems was never resolved and was affecting at least a few people.

I recently updated my server from Fedora 39 to Fedora 40 following which a pihole container which was working perfectly before the upgrade stopped functioning, or rather as it turns out port forward for that container stopped working and in a rather interesting way.

I was forwarding port 53 (tcp/udp) on the host to port 53 on the container, I was also forwarding port 8888 on the host to port 80 on the container for pihole's admin interface. After the upgrade port forwarding broke for both ports.

I've played around with different caps, I disabled selinux enforcement on the host and disabled the firewall (although it was correctly configured). I've checked and checked the container configuration, and even managed to prove that it working as expected except for the port forwarding issue (see below).

To cut a very long story short, here's what I discovered after hours of trying to get things working again. I was able to reach both ports through the container IP, thus demonstrating that the container was functioning correctly. When I changed the ports used for forwarding 8888 > 8765 and 53 > 54 port forward worked! Therefore the issue is specific to certain ports, in my experience 53 as in the original ticket but also others including 8888.

A half dozen other containers, all with port forwards are unaffected by this issue.

I can't see an obvious connection between port 53 and 8888 however maybe those two ports share something in common that triggers a thought for someone.

Steps to reproduce the issue

  1. Run a container forwarding port 53 (or 8888) from host

Describe the results you received

Port forwarding for some ports is resulting in traffic just disappearing into the void.

Describe the results you expected

Traffic forwarded from the host on mapped ports to reach the container

podman info output

Podman version 5.2.3 Fedora 40 x86_64

Podman in a container

No

Privileged Or Rootless

Privileged

Upstream Latest Release

No

Additional environment details

Additional environment details

Additional information

Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting

sbrivio-rh commented 1 week ago

I was forwarding port 53 (tcp/udp) on the host to port 53 on the container

Let's focus on this for a moment. Can you check what process port 53 is bound to? fuser -n tcp 53 and fuser -n udp 53. I wonder if another process you don't expect is stealing your packets.

stuartm commented 1 week ago

Last night after opening this ticket I had a thought. It was just too great a coincidence that of all the ports that were not working it was the exact two forwarded for this container which were having issues.

Immediately after the upgrade when I first started the container it had failed to start due to a conflict on port 53 with aardvark - I changed the port forwarding to listen only on the external IP and restarted it which fixed the binding issue but then I found that port forwarding was not working. I formed a theory that forwarding rules had been created on this first start but then not removed when it subsequently failed to bind to port 53 on all addresses. Rather than messing about trying to find where these rules were (in hindsight probably just iptables?) I just restarted the host and that fixed the issue.

So there is a bug here, but it's not what I thought, it appears under a certain failure scenario podman is creating port forwarding rules but then not cleaning them up correctly. Resulting in all traffic sent to those ports presumably being sent to the incorrect container IP.

This might also explain why so many people were having issues specifically with pihole, standard instructions for pihole have port forwarding on port 53 enabled for all addresses by default. With the introduction of aardvark on the container interface this would result in the first start of the pihole container always failing, like me many would have restricted pihole to listen on an external interface and then recreated the container only to find that things were still not working. Most of those people would have also restarted the host at some point, finding that the issue disappeared which explains why the original ticket was abandoned.

I don't know if you want to leave this ticket open for the orphaned port forwarding rule bug, or not, I leave that decision to you.

sbrivio-rh commented 1 week ago

I don't know if you want to leave this ticket open for the orphaned port forwarding rule bug, or not, I leave that decision to you.

I have no idea how that part works, but if there are stale nftables port forwarding rules (I'm not sure what component would add them?) then there's an actual issue somewhere...

Luap99 commented 1 week ago

Immediately after the upgrade when I first started the container it had failed to start due to a conflict on port 53 with aardvark - I changed the port forwarding to listen only on the external IP and restarted it which fixed the binding issue but then I found that port forwarding was not working. I formed a theory that forwarding rules had been created on this first start but then not removed when it subsequently failed to bind to port 53 on all addresses. Rather than messing about trying to find where these rules were (in hindsight probably just iptables?) I just restarted the host and that fixed the issue.

Yeah looking at the code it seems if we fail to start aardvark-dns we forget to teardown the driver again here https://github.com/containers/netavark/blob/d3769ed70ce02497e715fe4a3c5c2ea62938c113/src/commands/setup.rs#L152-L156

At least that is what I assume from your description but we definitely do not cleanup on failure there so leaking iptables rules and interface are to be expected in such case.

I move the issue to netavark