Closed waffshappen closed 1 year ago
Please provide your pasta version. Does it only effect the port forwarding or are outgoing connections also effected? Is the pasta process still running when it is running?
cc @sbrivio-rh @dgibson
Please provide your pasta version.
pasta 0^20230625.g32660ce-1.fc38.x86_64
Does it only effect the port forwarding or are outgoing connections also effected?
Good point, didnt think of that. Indeed the container cannot access anything outside once this occurs
curl -v http://1.1.1.1
* Trying 1.1.1.1:80...
* Immediate connect fail for 1.1.1.1: Network is unreachable
This works on the host of course.
Is the pasta process still running when it is running?
Yes. Here an example for a jellyfin Container (8096 exposed) that is currently in unreachable state:
ps -top | grep pas
tobias 2132678 0.0 0.1 76144 10168 ? Ss Jul27 0:35 /usr/bin/pasta --config-net -t 8096-8096:8096-8096 -u none -T none -U none --no-map-gw --netns /run/user/1000/netns/netns-e1e14b0a-c0a7-fc4c-1e75-b31778702fe1
Thanks for the report, there's not a lot to go on here but there are a few clues.
* Immediate connect fail for 1.1.1.1: Network is unreachable
The fact that we're getting a network unreachable error suggests one of two things is happening:
@waffshappen to pin down which of these it is, could you provide the output for ip link
, ip addr
and ip route
from within an affected container?
The pasta process you show doesn't appear to be actively running, which suggests the problem is not that we've somehow got into an infinite loop doing nothing useful.
@waffshappen to pin down which of these it is, could you provide the output for
ip link
,ip addr
andip route
from within an affected container?
Sure! Sorry for the delay, had to wait for a newly spawned container with ip
installed to go into unresponsive state because the other containers do not have that available default and without working connectivity i cant exactly add that.
From a Fedora 38 Container with a simple webserver that now is unresponsive again after ~24 hours:
ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc cake state UNKNOWN mode DEFAULT group default qlen 1000
link/ether 32:8f:4f:df:b3:44 brd ff:ff:ff:ff:ff:ff
ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc cake state UNKNOWN group default qlen 1000
link/ether 32:8f:4f:df:b3:44 brd ff:ff:ff:ff:ff:ff
inet6 fe80::308f:4fff:fedf:b344/64 scope link
valid_lft forever preferred_lft forever
ip route
No output
The pasta process you show doesn't appear to be actively running, which suggests the problem is not that we've somehow got into an infinite loop doing nothing useful.
That was what made me switch to pasta initially, given that it was rather easy to get slirp4netns into 100% usage situations for me whereas pasta handled that (udp packages sent to a host to check it -wasnt- reachable (nftables DROP), piling up in slirp forever) setup perfectly.
From a Fedora 38 Container with a simple webserver that now is unresponsive again after ~24 hours:
[...]
ip route
No output
This might sound a bit absurd, but... do you happen to have a DHCP client (possibly NetworkManager) running in the container? I can't explain why routes would disappear after ~24 hours otherwise.
Right, the container is losing all its addresses and routes. That certainly explains why it loses connectivity.
I can't see how pasta would touch those other than once during startup, and indeed past has no code at all to delete addresses or routes, only add them. So, I think it has to be something within the container actually doing the damage - maybe a DHCP client as @sbrivio-rh suggests. That still leaves the question of why it's doing that with pasta but not with slirp4netns (assuming that's the case, anyway).
I think the first stop is to look for any obvious DHCP clients in the container. If that doesn't lead to anything, I think we need to look for ways to monitor netlink activity within the container.
This might sound a bit absurd, but... do you happen to have a DHCP client (possibly NetworkManager) running in the container? I can't explain why routes would disappear after ~24 hours otherwise.
No, there are no active DHCP servers, clients or like that else inside the containers. Infact the specific Fedora Container has a single running binary: bash. (Dito for jellyfin (unless they do something cursed for network sharing) and mumble)
I think the first stop is to look for any obvious DHCP clients in the container.
And to my knowledge none are running.
If that doesn't lead to anything, I think we need to look for ways to monitor netlink activity within the container.
If that is tcpdump-able i might be able to just tcpdump
until it looses connectivity and store that to a volume? But doing so would require re-creating the container to install it, then wait until it happens again.
This might sound a bit absurd, but... do you happen to have a DHCP client (possibly NetworkManager) running in the container? I can't explain why routes would disappear after ~24 hours otherwise.
No, there are no active DHCP servers, clients or like that else inside the containers. Infact the specific Fedora Container has a single running binary: bash. (Dito for jellyfin (unless they do something cursed for network sharing) and mumble)
I think the first stop is to look for any obvious DHCP clients in the container.
And to my knowledge none are running.
Drat. So much for an easy answer.
If that doesn't lead to anything, I think we need to look for ways to monitor netlink activity within the container.
If that is tcpdump-able i might be able to just
tcpdump
until it looses connectivity and store that to a volume? But doing so would require re-creating the container to install it, then wait until it happens again.
It is possible to use tcpdump
here, but there are dedicated tools (rtmon
and ip monitor
) that are probably more useful. However, those primarily show what netlink events occur, whereas we're more concerned with who is performing the netlink operations. For finding the latter systemtap might be more useful. All of these options will require recreating the container and waiting for the problem to reproduce, as you note.
Let's gather a little more background information before we attempt that though.
ps afx
from within the container? This is on the off chance that there's something non-obvious to you that stands out to me or my colleagues as a clue.I don't think we can 100% rule out a DHCP client as the culprit yet - it doesn't seem like one is running persistently, but it's possible one ran transiently at the point things broke. So, I think it's worth checking what effect running a DHCP client would have:
dhclient -v eno1
manually. Does this give any errors? Is network connectivity restored after it completes?1. Can you give a general idea of what the container is for, and what it does? Maybe this will give some clues as to where to focus our investigation.
The fedora
Container is just a fedora:latest container running bash and a minimal webserver, specifically to reproduce this.
The mumble
Container only runs https://hub.docker.com/r/mumblevoip/mumble-server - to run, well, mumble. Listens on tcp and udp - it was how i was first made aware of this bug as users could not connect anymore, at random. I used pasta to have the container see the real user ips without host networking - allowed the internal abuse limits to not apply to "all" ips since rootless changed them all to localhost of course.
Jellyfin is an instance of https://hub.docker.com/r/jellyfin/jellyfin - all three of thse are single purpose containers pretty much to run a single app with no other automations, management tools etc. added outside of whats shipped. And atleast these three dont even use pods or similar.
2. Can you provide the output from `ps afx` from within the container? This is on the off chance that there's something non-obvious to you that stands out to me or my colleagues as a clue.
None of these has ps
installed default, i'll add it to the "restart, install, wait" list.
However from the host for the fedora container conmon:
3044584 ? Ss 0:00 /usr/bin/conmon --api-version 1 -c fc893e44d9b601b3edf1f73ad7b400b25138788d169479cc5f673e6cc3248f45 -u fc893e44d9b601b3edf1f73ad7b400b25138788d169479cc5f673e6cc3248f45 -r /usr/bin/crun -b /home/tobias/.local/share/containers/storage/overlay-containers/fc893e44d9b601b3edf1f73ad7b400b25138788d169479cc5f673e6cc3248f45/userdata -p /run/user/1000/containers/overlay-containers/fc893e44d9b601b3edf1f73ad7b400b25138788d169479cc5f673e6cc3248f45/userdata/pidfile -n fedotest --exit-dir /run/user/1000/libpod/tmp/exits --full-attach -l journald --log-level warning --syslog --runtime-arg --log-format=json --runtime-arg --log --runtime-arg=/run/user/1000/containers/overlay-containers/fc893e44d9b601b3edf1f73ad7b400b25138788d169479cc5f673e6cc3248f45/userdata/oci-log -t --conmon-pidfile /run/user/1000/containers/overlay-containers/fc893e44d9b601b3edf1f73ad7b400b25138788d169479cc5f673e6cc3248f45/userdata/conmon.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /home/tobias/.local/share/containers/storage --exit-command-arg --runroot --exit-command-arg /run/user/1000/containers --exit-command-arg --log-level --exit-command-arg warning --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /run/user/1000/libpod/tmp --exit-command-arg --network-config-dir --exit-command-arg --exit-command-arg --network-backend --exit-command-arg netavark --exit-command-arg --volumepath --exit-command-arg /home/tobias/.local/share/containers/storage/volumes --exit-command-arg --db-backend --exit-command-arg boltdb --exit-command-arg --transient-store=false --exit-command-arg --runtime --exit-command-arg crun --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --events-backend --exit-command-arg journald --exit-command-arg container --exit-command-arg cleanup --exit-command-arg fc893e44d9b601b3edf1f73ad7b400b25138788d169479cc5f673e6cc3248f45
3044586 pts/0 Ss+ 0:00 \_ /bin/bash
(Yes, that is the entire tree for that conmon)
I don't think we can 100% rule out a DHCP client as the culprit yet - it doesn't seem like one is running persistently, but it's possible one ran transiently at the point things broke. So, I think it's worth checking what effect running a DHCP client would have:
3. In the container in the broken state, try running `dhclient -v eno1` manually. Does this give any errors? Is network connectivity restored after it completes?
None of these has dhclient
installed default, i'll add it to the "restart, install, wait" list. Bit i dont think it'll do much because:
4. In a container in unbroken state (but based on the same image), try the same thing. Does this give any errors? Does it break network connectivity?
As root inside the new fedora container:
RTNETLINK answers: Operation not permitted
Open a socket for LPF: Operation not permitted
Any chance a software update involving firewalld or iptables is happening, and strikes the iptables rules? Or any other tool that might muck around with them?
1. Can you give a general idea of what the container is for, and what it does? Maybe this will give some clues as to where to focus our investigation.
The
fedora
Container is just a fedora:latest container running bash and a minimal webserver, specifically to reproduce this.
Ah, ok. So have you succeeded in reproducing in this test fedora
container without any particular app? It wasn't previously clear to me that this has happened with multiple different container images.
What exactly is the minimal webserver you're using?
The
mumble
Container only runs https://hub.docker.com/r/mumblevoip/mumble-server - to run, well, mumble. Listens on tcp and udp - it was how i was first made aware of this bug as users could not connect anymore, at random. I used pasta to have the container see the real user ips without host networking - allowed the internal abuse limits to not apply to "all" ips since rootless changed them all to localhost of course.Jellyfin is an instance of https://hub.docker.com/r/jellyfin/jellyfin - all three of thse are single purpose containers pretty much to run a single app with no other automations, management tools etc. added outside of whats shipped. And atleast these three dont even use pods or similar.
Ok, understood.
2. Can you provide the output from `ps afx` from within the container? This is on the off chance that there's something non-obvious to you that stands out to me or my colleagues as a clue.
None of these has
ps
installed default, i'll add it to the "restart, install, wait" list.
If the problem has reproduced on the fedora
container, then I don't think we need this info from the others.
However from the host for the fedora container conmon:
3044584 ? Ss 0:00 /usr/bin/conmon --api-version 1 -c fc893e44d9b601b3edf1f73ad7b400b25138788d169479cc5f673e6cc3248f45 -u fc893e44d9b601b3edf1f73ad7b400b25138788d169479cc5f673e6cc3248f45 -r /usr/bin/crun -b /home/tobias/.local/share/containers/storage/overlay-containers/fc893e44d9b601b3edf1f73ad7b400b25138788d169479cc5f673e6cc3248f45/userdata -p /run/user/1000/containers/overlay-containers/fc893e44d9b601b3edf1f73ad7b400b25138788d169479cc5f673e6cc3248f45/userdata/pidfile -n fedotest --exit-dir /run/user/1000/libpod/tmp/exits --full-attach -l journald --log-level warning --syslog --runtime-arg --log-format=json --runtime-arg --log --runtime-arg=/run/user/1000/containers/overlay-containers/fc893e44d9b601b3edf1f73ad7b400b25138788d169479cc5f673e6cc3248f45/userdata/oci-log -t --conmon-pidfile /run/user/1000/containers/overlay-containers/fc893e44d9b601b3edf1f73ad7b400b25138788d169479cc5f673e6cc3248f45/userdata/conmon.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /home/tobias/.local/share/containers/storage --exit-command-arg --runroot --exit-command-arg /run/user/1000/containers --exit-command-arg --log-level --exit-command-arg warning --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /run/user/1000/libpod/tmp --exit-command-arg --network-config-dir --exit-command-arg --exit-command-arg --network-backend --exit-command-arg netavark --exit-command-arg --volumepath --exit-command-arg /home/tobias/.local/share/containers/storage/volumes --exit-command-arg --db-backend --exit-command-arg boltdb --exit-command-arg --transient-store=false --exit-command-arg --runtime --exit-command-arg crun --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --events-backend --exit-command-arg journald --exit-command-arg container --exit-command-arg cleanup --exit-command-arg fc893e44d9b601b3edf1f73ad7b400b25138788d169479cc5f673e6cc3248f45 3044586 pts/0 Ss+ 0:00 \_ /bin/bash
(Yes, that is the entire tree for that conmon)
Ok, thanks.
I don't think we can 100% rule out a DHCP client as the culprit yet - it doesn't seem like one is running persistently, but it's possible one ran transiently at the point things broke. So, I think it's worth checking what effect running a DHCP client would have:
3. In the container in the broken state, try running `dhclient -v eno1` manually. Does this give any errors? Is network connectivity restored after it completes?
None of these has
dhclient
installed default, i'll add it to the "restart, install, wait" list. Bit i dont think it'll do much because:
Ok. Again, as long as the problem has reproduced in the fedora
container, I don't think we need to try this anywhere else.
4. In a container in unbroken state (but based on the same image), try the same thing. Does this give any errors? Does it break network connectivity?
As root inside the new fedora container:
RTNETLINK answers: Operation not permitted Open a socket for LPF: Operation not permitted
Ah, drat. I forgot that's how the permissions worked with podman. Which... come to think of it rather nixes my theory that something within the container is going rogue. Given the permission errors above, anything that's doing that should hit the same permission error.
So.. something outside the container messing with its network configuration. pasta
itself is the obvious candidate, but as noted above, I don't see how anything in there could cause this symptom. I think I'm going to have to come up with a systemtap script or similar that will find things touching the container's netlink interface. That will take a little research, in the meantime here's some more things we can try:
fedora
container image you're using? (That will give us the best chance to reproduce it here, which I've started trying)-p
option to podman, and simply run a shell in the fedora container? This might eliminate a few more possibilities.fedora
container, install the iproute
package within it and leave running the command: ip -ts monitor dev eno0
(replace eno0
with the name of the container's external network interface if it's different). This, alas, won't show us what is messing with netlink, but it will show us what netlink operations are happening, which might provide some clues.Sorry I should have mentioned that earlier, unless the container is started with --cap-add NET_ADMIN
or --privileged
(adds all caps) then the container process will not be allowed to modify the net namespace.
As for monitoring the netns the best way would be to join it with that command:
podman unshare nsenter --net=$(podman container inspect --format {{.NetworkSettings.SandboxKey}} <NAME>)
# replace <NAME> with your actual container name or id
This gives you full capabilities for that netns while staying on the host fs so you do not need to install ip and other utils in the container.
Any chance a software update involving firewalld or iptables is happening, and strikes the iptables rules? Or any other tool that might muck around with them?
Not automatically on that machine. dnf-automatic
is set up but only to automatically pull the packages so they're ready when i'm ready.
Ah, ok. So have you succeeded in reproducing in this test
fedora
container without any particular app? It wasn't previously clear to me that this has happened with multiple different container images.What exactly is the minimal webserver you're using?
Both apache (default page) and simply:
while true; do { echo -ne "HTTP/1.0 200 OK\r\nContent-Length: $(wc -c <index.html)\r\n\r\n"; cat index.html; } | nc -l -p 8098 ; \ ; done
(With a minimal index.html next to it) break this way.
1. Can you give the exact version / UUID of the `fedora` container image you're using? (That will give us the best chance to reproduce it here, which I've started trying)
The specific image id was ad2032316c2664fe02873afaf98e6ab5323d1980d4b99d8de55848cd6ffae1f8
but it has persisted across previous pulls and entirely different os base images (mumbles default build on ubuntu for example). I can try with the new image release - but since it happened in other containers with other distros i didnt think it'd make a difference.
2. Can you reproduce the problem if you don't include the `-p` option to podman, and simply run a shell in the fedora container? This might eliminate a few more possibilities.
This, also, looses its connectivity.
3. Can you run another `fedora` container, install the `iproute` package within it and leave running the command: `ip -ts monitor dev eno0` (replace `eno0` with the name of the container's external network interface if it's different). This, alas, won't show us _what_ is messing with netlink, but it will show us what netlink operations are happening, which might provide some clues.
I have re-pulled the :latest fedora and tested again, took ~9 hours, some values shortened:
ip -ts monitor dev eno1
[2023-08-09T10:34:43.931476] 10.x.0.1 lladdr 94:18[host] STALE
[2023-08-09T17:27:49.851562] Deleted 2: eno1 inet 10.x.0.4/24 brd 10.x.0.255 scope global dynamic noprefixroute eno1
valid_lft 0sec preferred_lft 0sec
[2023-08-09T17:27:49.855557] Deleted broadcast 10.x.0.255 table local proto kernel scope link src 10.x.0.4
[2023-08-09T17:27:49.857015] Deleted local 10.x.0.4 table local proto kernel scope host src 10.x.0.4
[2023-08-09T17:27:49.857062] Deleted 10.x.0.1 lladdr 94:18[host] STALE
[2023-08-09T17:42:18.203511] Deleted 2: eno1 inet6 2003:ed:[prefix]/128 scope global dynamic noprefixroute
valid_lft 0sec preferred_lft 0sec
[2023-08-09T17:42:18.203986] Deleted local 2003:ed:[prefix] table local proto kernel metric 0 pref medium
This specific host is running behind openwrt, the other affected machine is a hetzner root server - in case that changes anything.
As for monitoring the netns the best way would be to join it with that command:
I'll do that next Time, thanks!
ip -ts monitor dev eno1 [2023-08-09T10:34:43.931476] 10.x.0.1 lladdr 94:18[host] STALE [2023-08-09T17:27:49.851562] Deleted 2: eno1 inet 10.x.0.4/24 brd 10.x.0.255 scope global dynamic noprefixroute eno1 valid_lft 0sec preferred_lft 0sec [2023-08-09T17:27:49.855557] Deleted broadcast 10.x.0.255 table local proto kernel scope link src 10.x.0.4 [2023-08-09T17:27:49.857015] Deleted local 10.x.0.4 table local proto kernel scope host src 10.x.0.4 [2023-08-09T17:27:49.857062] Deleted 10.x.0.1 lladdr 94:18[host] STALE [2023-08-09T17:42:18.203511] Deleted 2: eno1 inet6 2003:ed:[prefix]/128 scope global dynamic noprefixroute valid_lft 0sec preferred_lft 0sec [2023-08-09T17:42:18.203986] Deleted local 2003:ed:[prefix] table local proto kernel metric 0 pref medium
Well, the addresses sure are being deleted. Alas, as I feared, seeing what and when it's happening isn't providing many clues as to who's doing it and why.
I'm working on writing a systemtap script which will be able to log what is performing these address removals, unfortunately I'm having some trouble getting it working (especially since I've encountered this bug along the way).
While I'm working on that, here are some more things we can try:
pasta --config-net
as the same user you run the podman containers as.ip -ts monitor dev eno1
within that shell, to monitor changes to its network configurationThe idea here is to see if the same problem occurs on a "bare" pasta instance, or if the additional steps of podman creating the full container are somehow triggering the problem.
My own attempts to reproduce are still running. No signs of the problem so far, after a bit under a day. I'll leave it running, but at this point I strongly suspect something different on your system is triggering the problem.
This specific host is running behind openwrt, the other affected machine is a hetzner root server - in case that changes anything.
I don't think that's relevant at this stage, but good to know just in case.
As for monitoring the netns the best way would be to join it with that command:
I'll do that next Time, thanks!
Given the new working theory, I don't think this is necessary for the current steps. However, it does allow the possibility of a (poor) interim workaround: for your "real" containers that encounter this problem you could log in that way and manually reconfigure the network.
I'm working on writing a systemtap script which will be able to log what is performing these address removals, unfortunately I'm having some trouble getting it working (especially since I've encountered this bug along the way).
Ah, of course when i come across a bug everything is maximum cursed in some way, i am getting used to that. ^^
1. What distro is running on your host? What kernel version is it running? These will help me make a systemtap script that works for your system.
Fedora 38, Both 6.3.12-200.fc38.x86_64
(Locally accessible only) and 6.4.8-200.fc38.x86_64
. One Machine i am holding back from all Changes so it can be debugged just in case the bug gets fixed on a newer Kernel somehow.
2. Are there any routing daemons or VPNs running on your host? These shouldn't interfere with the container obviously, but they are at least candidates for manipulating addresses and routes.
On the Hetzner: Yes, Wireguard on the host directly (P2P Site Network for sharing home access to the 10.x.0.0/24 network on each side and allowing access to my self hosted content (like nextcloud) over a vpn).
On the local machine: Not directly, no. Wireguard is running on openwrt infront of it instead. However the local Machine runs libvirtd with 1 vm instead.
3. Can you try the following as another reproduction attempt: * Run `pasta --config-net` as the same user you run the podman containers as. * This will bring up a "root" shell (actually only root within a new user namespace, similar to a container).
Slight (selinux) issue, as user:
pasta --config-net
Couldn't create user namespace: Permission denied
And as root:
pasta --config-net
Don't run as root. Changing to nobody...
Can't set GID to 65534: Operation not permitted
* Verify that you have basic network connectivity within that shell * Run `ip -ts monitor dev eno1` within that shell, to monitor changes to its network configuration * Leave running for 24-48 hours
To be fair that was selinux blocking it being called directly. Without it active, spawning it and enabling it again it runs and has connectivity, i'll leave the shell open with it monitoring.
Given the new working theory, I don't think this is necessary for the current steps. However, it does allow the possibility of a (poor) interim workaround: for your "real" containers that encounter this problem you could log in that way and manually reconfigure the network.
I've bitten the bullet of having the affected containers that need to expose ports directly running with host networking or falling back to slirp and trying to avoid triggering its bugs for those that do fine with "all access is from localhost".
I'm working on writing a systemtap script which will be able to log what is performing these address removals, unfortunately I'm having some trouble getting it working (especially since I've encountered this bug along the way).
Ah, of course when i come across a bug everything is maximum cursed in some way, i am getting used to that. ^^
Well, based on some further developments I'll relate below, alas, I cannot but agree.
1. What distro is running on your host? What kernel version is it running? These will help me make a systemtap script that works for your system.
Fedora 38, Both
6.3.12-200.fc38.x86_64
(Locally accessible only) and6.4.8-200.fc38.x86_64
. One Machine i am holding back from all Changes so it can be debugged just in case the bug gets fixed on a newer Kernel somehow.
The original pasta connectivity bug? Or the systemtap bug? Or something else?
The good news is that I'm also running Fedora 38 with a similar kernel, so the chances are if I can get a systemtap script working locally it should work for you too. The bad news is that the 6.4 kernels seem to be the ones that aren't working with systemtap currently, so you're likely to encounter the same problem
The better news is that a draft fix for the systemtap bug was posted. The worse news is that, at least for me, it now fails differently: instead of a compile error I get a kernel oops
2. Are there any routing daemons or VPNs running on your host? These shouldn't interfere with the container obviously, but they are at least candidates for manipulating addresses and routes.
On the Hetzner: Yes, Wireguard on the host directly (P2P Site Network for sharing home access to the 10.x.0.0/24 network on each side and allowing access to my self hosted content (like nextcloud) over a vpn).
On the local machine: Not directly, no. Wireguard is running on openwrt infront of it instead. However the local Machine runs libvirtd with 1 vm instead.
Ok, good to know. Probably not the culprit, based on that.
3. Can you try the following as another reproduction attempt: * Run `pasta --config-net` as the same user you run the podman containers as. * This will bring up a "root" shell (actually only root within a new user namespace, similar to a container).
Slight (selinux) issue, as user:
pasta --config-net Couldn't create user namespace: Permission denied
Ah, right. That's a known issue with the selinux profile and recent kernels - @sbrivio-rh is working on it, but has had to battle through some additional complications.
And as root:
pasta --config-net Don't run as root. Changing to nobody... Can't set GID to 65534: Operation not permitted
Right, pasta explicitly avoids running as root.
* Verify that you have basic network connectivity within that shell * Run `ip -ts monitor dev eno1` within that shell, to monitor changes to its network configuration * Leave running for 24-48 hours
To be fair that was selinux blocking it being called directly. Without it active, spawning it and enabling it again it runs and has connectivity, i'll leave the shell open with it monitoring.
Great, thanks
Given the new working theory, I don't think this is necessary for the current steps. However, it does allow the possibility of a (poor) interim workaround: for your "real" containers that encounter this problem you could log in that way and manually reconfigure the network.
I've bitten the bullet of having the affected containers that need to expose ports directly running with host networking or falling back to slirp and trying to avoid triggering its bugs for those that do fine with "all access is from localhost".
I just realised:
ip -ts monitor dev eno1 [2023-08-09T10:34:43.931476] 10.x.0.1 lladdr 94:18[host] STALE [2023-08-09T17:27:49.851562] Deleted 2: eno1 inet 10.x.0.4/24 brd 10.x.0.255 scope global dynamic noprefixroute eno1 valid_lft 0sec preferred_lft 0sec
...that the valid lifetime at this point is 0. The address is dynamic
in the sense that it's not permanent
. When pasta adds addresses, by choice, it doesn't use the IFA_F_PERMANENT
netlink flag because, strictly speaking, that means "configured by the user", and pasta is not the user... and we didn't notice any issue with that.
But maybe there's some new (unintended, I guess, I couldn't find anything relevant in recent kernel commits) behaviour implied by the kernel which makes IFA_F_PERMANENT
necessary.
@waffshappen, could you try something like this snippet (patch applies on top of current git HEAD
)?
diff --git a/netlink.c b/netlink.c
index 1226379..f7b2907 100644
--- a/netlink.c
+++ b/netlink.c
@@ -604,6 +604,7 @@ int nl_addr_set(int s, unsigned int ifi, sa_family_t af,
.ifa.ifa_index = ifi,
.ifa.ifa_prefixlen = prefix_len,
.ifa.ifa_scope = RT_SCOPE_UNIVERSE,
+ .ifa.ifa_flags = IFA_F_PERMANENT,
};
ssize_t len;
@@ -611,7 +612,7 @@ int nl_addr_set(int s, unsigned int ifi, sa_family_t af,
size_t rta_len = RTA_LENGTH(sizeof(req.set.a6.l));
/* By default, strictly speaking, it's duplicated */
- req.ifa.ifa_flags = IFA_F_NODAD;
+ req.ifa.ifa_flags |= IFA_F_NODAD;
len = offsetof(struct req_t, set.a6) + sizeof(req.set.a6);
if it's too much of a hassle for you to try building with this, I can also provide a build -- let me know.
The original pasta connectivity bug? Or the systemtap bug? Or something else?
pasta
The better news is that a draft fix for the systemtap bug was posted. The worse news is that, at least for me, it now fails differently: instead of a compile error I get a kernel oops
Yeah, maximum cursed as usual ^^
To be fair that was selinux blocking it being called directly. Without it active, spawning it and enabling it again it runs and has connectivity, i'll leave the shell open with it monitoring.
In the shell spawned with just pasta --config-net
the same bug occurs:
ip -ts monitor dev eno1
[2023-08-10T15:28:26.971513] Deleted 2: eno1 inet 10.x.0.4/24 brd 10.x.0.255 scope global dynamic noprefixroute eno1 valid_lft 0sec preferred_lft 0sec
[2023-08-10T15:28:26.971816] Deleted broadcast 10.x.0.255 table local proto kernel scope link src 10.x.0.4
[2023-08-10T15:28:26.974232] Deleted local 10.x.0.4 table local proto kernel scope host src 10.x.0.4
[2023-08-10T15:28:26.974250] Deleted 10.x.0.1 lladdr 94:18[host] STALE
[2023-08-10T19:19:40.187473] Deleted 2: eno1 inet6 2003:ed:[prefix]/128 scope global dynamic noprefixroute valid_lft 0sec preferred_lft 0sec
[2023-08-10T19:19:40.187719] Deleted local 2003:ed:[prefix] table local proto kernel metric 0 pref medium
[2023-08-11T02:55:20.923517] Deleted 2: eno1 inet6 2003:ed:[prefix]/64 scope global dynamic noprefixroute valid_lft 0sec preferred_lft 0sec
[2023-08-11T02:55:20.923779] Deleted local 2003:ed:[prefix] table local proto kernel metric 0 pref medium
diff --git a/netlink.c b/netlink.c
I'll try building that and running it, i'll let you know what happens (since i can reproduce the bug with just pasta i'll spawn it from the build output without trying to get podman and selinux to cooperate with the result)
Also does this mean that pasta doesnt handle Address Changes on the host as well? Or are new Address Events handled already? My external Addresses, especially ipv6 since that propagates through the entire Network, change constantly at home. (And i guess my ipv4 become stale for pasta as the default lease time runs out?)
Also does this mean that pasta doesnt handle Address Changes on the host as well?
At the moment, address changes are handled implicitly, in the sense that when addresses change on the host, pasta will just naturally switch to NAT (assuming default options, that is, with host addresses copied to the containers). However:
Or are new Address Events handled already? My external Addresses, especially ipv6 since that propagates through the entire Network, change constantly at home. (And i guess my ipv4 become stale for pasta as the default lease time runs out?)
...we had feature requests to monitor IPv6 prefix changes, via netlink, and update the prefix in the container accordingly. @dgibson is working on a more flexible model for forwarding and address translation, once that part is done we'll be able to support this.
For IPv4 we could probably support this with a netmask. At the moment, if the address on your host expires, pasta will just use the new address like any other process running there, but the address in the container should be preserved.
Also does this mean that pasta doesnt handle Address Changes on the host as well?
At the moment, address changes are handled implicitly, in the sense that when addresses change on the host, pasta will just naturally switch to NAT (assuming default options, that is, with host addresses copied to the containers). However:
To elaborate on this. At present we don't monitor for changes to the host addresses. We have considered it for various reasons, and may do so in future. That doesn't mean that a host address change will break container connectivity, though: the container won't see an address change, but it will still be able to make connections outward and they'll be implicitly NATted. For inbound connections it depends, if pasta's forwarded ports aren't bound to a specific address, it will again implicitly NAT. If they are, and that address changes on the host, then as you'd expect you'll no longer be able to access that forwarding.
Or are new Address Events handled already? My external Addresses, especially ipv6 since that propagates through the entire Network, change constantly at home. (And i guess my ipv4 become stale for pasta as the default lease time runs out?)
...we had feature requests to monitor IPv6 prefix changes, via netlink, and update the prefix in the container accordingly. @dgibson is working on a more flexible model for forwarding and address translation, once that part is done we'll be able to support this.
Actually, there's less overlap between the forwarding model and handling address updates than you might think. The new forwarding option would certainly give a lot more flexibility with how exactly we'd handle a changing host address, though.
For IPv4 we could probably support this with a netmask. At the moment, if the address on your host expires, pasta will just use the new address like any other process running there, but the address in the container should be preserved.
So, @sbrivio-rh and I discussed this bug yesterday.. and I think we cracked it. Addresses do have a lifetime, seen in the ip addr
output as valid_lft
and preferred_lft
. If the address is set statically / manually, it will be forever
but if managed actively (e.g. by DHCP) then it will have a finite lifetime.
We think when pasta is copying address information from the host it's inadvertently copying the lifetimes from the host as well. So if the host has addresses with finite lifetime, they'll have finite lifetime in the guest as well, and eventually expire. However, the guest or container doesn't have the DHCP client or whatever was managing the address on the host, and so it just goes away.
I'm currently working on confirming this and figuring out what to do about it.
@waffshappen
I've made some changes that I think will fix the problem - essentially it just strips the lifetime information off the host address when copying it to the container. I have a branch here with the revised code. If you could try that out, that would be great.
I've also entered this in the pasta bugzilla as bug 70 so we have a record there.
I've made some changes that I think will fix the problem - essentially it just strips the lifetime information off the host address when copying it to the container. I have a branch here with the revised code. If you could try that out, that would be great.
Testing that does work and the pasta --config-net
has not lost Connectivity.
The only weird thing i can see is shortly after a ping attempt from it:
[2023-08-16T18:10:12.955496] 10.x.0.1 lladdr 94:18:[host] PROBE
[2023-08-16T18:10:12.955625] 10.x.0.1 lladdr 94:18:[host] REACHABLE
[2023-08-16T18:10:37.019494] 10.x.0.1 lladdr 94:18:[host] STALE
but it works just fine.
I have not tested changing the assigned dhcp ip however.
I've made some changes that I think will fix the problem - essentially it just strips the lifetime information off the host address when copying it to the container. I have a branch here with the revised code. If you could try that out, that would be great.
Testing that does work and the
pasta --config-net
has not lost Connectivity.
Excellent!
The only weird thing i can see is shortly after a ping attempt from it:
[2023-08-16T18:10:12.955496] 10.x.0.1 lladdr 94:18:[host] PROBE [2023-08-16T18:10:12.955625] 10.x.0.1 lladdr 94:18:[host] REACHABLE [2023-08-16T18:10:37.019494] 10.x.0.1 lladdr 94:18:[host] STALE
but it works just fine.
Right, I think that's some unrelated renewal stuff.
I have not tested changing the assigned dhcp ip however.
Ok. You mean the -a
option to pasta
I assume? By all means test this, but I don't think it will be affected. If my understanding of the cause of this problem is correct, when using -a
we wouldn't have hit this problem in the first place because we simply assign that address to the guest, rather than copying it (with all attributes) from the host which is what caused the problem.
Feel free to continue the discussion but be since the patch is applied (https://passt.top/passt/commit/?id=da0aeb9080c9d2e39b2ff600a9b2b03046ac219d), closing.
Issue Description
When binding a port with
podman run -p 8080:8080 --network pasta $otherargs
at a random point in time after starting the container no more external traffic will be able to reach services bound inside the container. There are no log entries in journald, no network changes - and it happens completely at random, on multiple systems.This works if its bound to a specific ip (127.0.0.1 for example) instead.
If i remember correctly this started happening around or shortly after the 4.5.0 release but still occurs after multiple passt and podman updates on Fedora since then.
Steps to reproduce the issue
Steps to reproduce the issue
--network pasta
and a bound-p 8080:8080
portDescribe the results you received
Nothing inside the container can respond to traffic anymore on external ips
Describe the results you expected
Services should still work
podman info output
Podman in a container
No
Privileged Or Rootless
Rootless
Upstream Latest Release
No
Additional environment details
No response
Additional information
No response