Open Luap99 opened 1 year ago
Did this PR try to fix this issue? https://github.com/containers/netavark/issues/825
What PR? You link to this issue.
Sorry I meant this PR: https://github.com/containers/netavark/pull/333 (Support read only /proc) for this issue: https://github.com/containers/netavark/issues/330
No, that PR only works if you already have the right sysctl values. This is about not having the right sysctl set and just ignoring it if we cannot set it. But this most likely means routing is non functional so I am not sure if this is a good idea.
Thanks for the info. The right sysctls being mentioned in the following issue right? https://github.com/containers/netavark/issues/362
--kubelet-extra-args '--allowed-unsafe-sysctls="net.ipv4.conf.default.route_localnet"'
I ran into this today setting up a rootless and unprivileged podman deployment inside of a k8s cluster. Here is the PodSpec for reference:
containers:
- image: quay.io/podman/stable:v4.9.0
name: podman
command:
- podman
- system
- service
- --log-level
- debug
- --transient-store
- --time
- "0"
- tcp://localhost:2375
securityContext:
runAsUser: 1000
runAsGroup: 1000
resources:
limits:
squat.ai/fuse: 1
squat.ai/tun: 1
I would've expected it to only set hard-required sysctls and ignore/not write any that already have the correct value or are optional but it seems it just tries to set them unconditionally causing problems for this setup.
I had tried to set the following sysctls using the PodSpec's securityContext.sysctls
field as needed (from what I can tell) to no avail. I also attempted a run with IPv6 disabled (network_cmd_options=["enable_ipv6=false"]
) which took care of some of the sysctl writes (maybe from slirp4netns though, not netavark).
sysctl | value | note |
---|---|---|
net.ipv4.ip_forward | 1 | |
net.ipv4.conf.default.arp_notify | 1 | |
net.ipv6.conf.[default|eth0].autoconf | 0 | |
net.ipv6.conf.default.accept_dad | 0 | Only when enable_ipv6=true? Logged as a warning |
net.ipv6.conf.default.accept_ra | 0 | Only when enable_ipv6=true? |
Checking from within the running pods, the sysctls have the values as set on the PodSpec and match the value netavark would write (which it still did even though it's already correct).
Logs show, for instance, [DEBUG netavark::network::core_utils] Setting sysctl value for net.ipv4.ip_forward to 1
(it already is) and then it fails with time="2024-02-14T17:10:53Z" level=info msg="Request Failed(Internal Server Error): netavark (exit code 1): Sysctl error: IO Error: Read-only file system (os error 30)"
I would've expected it to only set hard-required sysctls and ignore/not write any that already have the correct value or are optional but it seems it just tries to set them unconditionally causing problems for this setup.
We already first read the value and then only set it if it does not have the correct value.
Are there any sysctls missing or using incorrect values in the above table it doesn't log about then? If it's not writing if the sysctls are set to the expected values, I would not expect it to fail for not being able to write (if it does indeed not write anything in that case) :thinking:
I am stuck with the same issue. Any idea how to resolve this? Is there maybe another place where it writes to /proc ?
I recently took another shot at this with podman 5 but things have not changed on my end sadly. There's no documentation on what sysctl values are expected to be set or attempted to be set, what capabilities or filesystem access is needed, nothing.
The only information[1][2][3][4] I've found thus far suggests there's no need for it to be privileged, no need for (NET_ADMIN) capabilities, no need to set sysctls if they are already set correctly[5][6], etc.
Running podman without any networking seems to suggest this might actually be true but the moment networking is involved, it all falls apart.
That means it either just can't work rootless or without special privileges/capabilities at all yet if networking is involved (doubtful, everyone involved seems to present outward that it does), assumptions are made around the underlying systems/runtimes (e.g. device access, minimum set of capabilities, ...) and/or the documentation is incorrect/missing information (quite likely).
A quick test with dropping ALL capabilities quickly shows that podman needs at least SETGID
and SETUID
regardless for instance.
It also turns out we can request a trace
log level but this did not contain any more information I could personally use to diagnose what is going on:
$ podman run docker.io/library/busybox:latest echo 123
123
$ podman run --log-level trace --network podman docker.io/library/busybox:latest echo 123
INFO[0000] podman filtering at log level trace
DEBU[0000] Called run.PersistentPreRunE(podman run --log-level trace --network podman docker.io/library/busybox:latest echo 123)
DEBU[0000] Using conmon: "/usr/bin/conmon"
INFO[0000] Using sqlite as database backend
DEBU[0000] systemd-logind: dial unix /var/run/dbus/system_bus_socket: connect: no such file or directory
DEBU[0000] Using graph driver overlay
DEBU[0000] Using graph root /home/podman/.local/share/containers/storage
DEBU[0000] Using run root /tmp/storage-run-1000/containers
DEBU[0000] Using static dir /home/podman/.local/share/containers/storage/libpod
DEBU[0000] Using tmp dir /tmp/storage-run-1000/libpod/tmp
DEBU[0000] Using volume path /home/podman/.local/share/containers/storage/volumes
DEBU[0000] Using transient store: false
DEBU[0000] Not configuring container store
DEBU[0000] Initializing event backend file
DEBU[0000] Configured OCI runtime runj initialization failed: no valid executable found for OCI runtime runj: invalid argument
DEBU[0000] Configured OCI runtime kata initialization failed: no valid executable found for OCI runtime kata: invalid argument
DEBU[0000] Configured OCI runtime runsc initialization failed: no valid executable found for OCI runtime runsc: invalid argument
DEBU[0000] Configured OCI runtime youki initialization failed: no valid executable found for OCI runtime youki: invalid argument
DEBU[0000] Configured OCI runtime krun initialization failed: no valid executable found for OCI runtime krun: invalid argument
DEBU[0000] Configured OCI runtime ocijail initialization failed: no valid executable found for OCI runtime ocijail: invalid argument
TRAC[0000] found runtime "/usr/bin/crun"
DEBU[0000] Configured OCI runtime runc initialization failed: no valid executable found for OCI runtime runc: invalid argument
DEBU[0000] Configured OCI runtime crun-vm initialization failed: no valid executable found for OCI runtime crun-vm: invalid argument
DEBU[0000] Configured OCI runtime crun-wasm initialization failed: no valid executable found for OCI runtime crun-wasm: invalid argument
DEBU[0000] Using OCI runtime "/usr/bin/crun"
INFO[0000] Setting parallel job count to 193
DEBU[0000] Could not move to subcgroup: mkdir /sys/fs/cgroup/init: read-only file system
INFO[0000] podman filtering at log level trace
DEBU[0000] Called run.PersistentPreRunE(podman run --log-level trace --network podman docker.io/library/busybox:latest echo 123)
DEBU[0000] Using conmon: "/usr/bin/conmon"
INFO[0000] Using sqlite as database backend
DEBU[0000] systemd-logind: dial unix /var/run/dbus/system_bus_socket: connect: no such file or directory
DEBU[0000] Using graph driver overlay
DEBU[0000] Using graph root /home/podman/.local/share/containers/storage
DEBU[0000] Using run root /tmp/storage-run-1000/containers
DEBU[0000] Using static dir /home/podman/.local/share/containers/storage/libpod
DEBU[0000] Using tmp dir /tmp/storage-run-1000/libpod/tmp
DEBU[0000] Using volume path /home/podman/.local/share/containers/storage/volumes
DEBU[0000] Using transient store: false
DEBU[0000] [graphdriver] trying provided driver "overlay"
DEBU[0000] overlay: storage already configured with a mount-program
DEBU[0000] backingFs=overlayfs, projectQuotaSupported=false, useNativeDiff=false, usingMetacopy=false
DEBU[0000] Initializing event backend file
DEBU[0000] Configured OCI runtime crun-vm initialization failed: no valid executable found for OCI runtime crun-vm: invalid argument
DEBU[0000] Configured OCI runtime runc initialization failed: no valid executable found for OCI runtime runc: invalid argument
DEBU[0000] Configured OCI runtime runj initialization failed: no valid executable found for OCI runtime runj: invalid argument
DEBU[0000] Configured OCI runtime runsc initialization failed: no valid executable found for OCI runtime runsc: invalid argument
DEBU[0000] Configured OCI runtime youki initialization failed: no valid executable found for OCI runtime youki: invalid argument
TRAC[0000] found runtime "/usr/bin/crun"
DEBU[0000] Configured OCI runtime crun-wasm initialization failed: no valid executable found for OCI runtime crun-wasm: invalid argument
DEBU[0000] Configured OCI runtime kata initialization failed: no valid executable found for OCI runtime kata: invalid argument
DEBU[0000] Configured OCI runtime krun initialization failed: no valid executable found for OCI runtime krun: invalid argument
DEBU[0000] Configured OCI runtime ocijail initialization failed: no valid executable found for OCI runtime ocijail: invalid argument
DEBU[0000] Using OCI runtime "/usr/bin/crun"
INFO[0000] Setting parallel job count to 193
DEBU[0000] Could not move to subcgroup: mkdir /sys/fs/cgroup/init: read-only file system
DEBU[0000] Pulling image docker.io/library/busybox:latest (policy: missing)
DEBU[0000] Looking up image "docker.io/library/busybox:latest" in local containers storage
DEBU[0000] Normalized platform linux/amd64 to {amd64 linux [] }
DEBU[0000] Trying "docker.io/library/busybox:latest" ...
DEBU[0000] parsed reference into "[overlay@/home/podman/.local/share/containers/storage+/tmp/storage-run-1000/containers]@65ad0d468eb1c558bf7f4e64e790f586e9eda649ee9f130cd0e835b292bbc5ac"
DEBU[0000] Found image "docker.io/library/busybox:latest" as "docker.io/library/busybox:latest" in local containers storage
DEBU[0000] Found image "docker.io/library/busybox:latest" as "docker.io/library/busybox:latest" in local containers storage ([overlay@/home/podman/.local/share/containers/storage+/tmp/storage-run-1000/containers]@65ad0d468eb1c558bf7f4e64e790f586e9eda649ee9f130cd0e835b292bbc5ac)
DEBU[0000] exporting opaque data as blob "sha256:65ad0d468eb1c558bf7f4e64e790f586e9eda649ee9f130cd0e835b292bbc5ac"
DEBU[0000] Looking up image "docker.io/library/busybox:latest" in local containers storage
DEBU[0000] Normalized platform linux/amd64 to {amd64 linux [] }
DEBU[0000] Trying "docker.io/library/busybox:latest" ...
DEBU[0000] parsed reference into "[overlay@/home/podman/.local/share/containers/storage+/tmp/storage-run-1000/containers]@65ad0d468eb1c558bf7f4e64e790f586e9eda649ee9f130cd0e835b292bbc5ac"
DEBU[0000] Found image "docker.io/library/busybox:latest" as "docker.io/library/busybox:latest" in local containers storage
DEBU[0000] Found image "docker.io/library/busybox:latest" as "docker.io/library/busybox:latest" in local containers storage ([overlay@/home/podman/.local/share/containers/storage+/tmp/storage-run-1000/containers]@65ad0d468eb1c558bf7f4e64e790f586e9eda649ee9f130cd0e835b292bbc5ac)
DEBU[0000] exporting opaque data as blob "sha256:65ad0d468eb1c558bf7f4e64e790f586e9eda649ee9f130cd0e835b292bbc5ac"
DEBU[0000] User mount /proc:/proc options []
DEBU[0000] Looking up image "docker.io/library/busybox:latest" in local containers storage
DEBU[0000] Normalized platform linux/amd64 to {amd64 linux [] }
DEBU[0000] Trying "docker.io/library/busybox:latest" ...
DEBU[0000] parsed reference into "[overlay@/home/podman/.local/share/containers/storage+/tmp/storage-run-1000/containers]@65ad0d468eb1c558bf7f4e64e790f586e9eda649ee9f130cd0e835b292bbc5ac"
DEBU[0000] Found image "docker.io/library/busybox:latest" as "docker.io/library/busybox:latest" in local containers storage
DEBU[0000] Found image "docker.io/library/busybox:latest" as "docker.io/library/busybox:latest" in local containers storage ([overlay@/home/podman/.local/share/containers/storage+/tmp/storage-run-1000/containers]@65ad0d468eb1c558bf7f4e64e790f586e9eda649ee9f130cd0e835b292bbc5ac)
DEBU[0000] exporting opaque data as blob "sha256:65ad0d468eb1c558bf7f4e64e790f586e9eda649ee9f130cd0e835b292bbc5ac"
DEBU[0000] Inspecting image 65ad0d468eb1c558bf7f4e64e790f586e9eda649ee9f130cd0e835b292bbc5ac
DEBU[0000] exporting opaque data as blob "sha256:65ad0d468eb1c558bf7f4e64e790f586e9eda649ee9f130cd0e835b292bbc5ac"
DEBU[0000] Inspecting image 65ad0d468eb1c558bf7f4e64e790f586e9eda649ee9f130cd0e835b292bbc5ac
DEBU[0000] Inspecting image 65ad0d468eb1c558bf7f4e64e790f586e9eda649ee9f130cd0e835b292bbc5ac
DEBU[0000] Inspecting image 65ad0d468eb1c558bf7f4e64e790f586e9eda649ee9f130cd0e835b292bbc5ac
DEBU[0000] User mount /proc:/proc options []
DEBU[0000] using systemd mode: false
DEBU[0000] Loading seccomp profile from "/usr/share/containers/seccomp.json"
DEBU[0000] Adding mount /dev
DEBU[0000] Adding mount /dev/pts
DEBU[0000] Adding mount /sys
DEBU[0000] Adding mount /dev/mqueue
DEBU[0000] Adding mount /sys/fs/cgroup
DEBU[0000] Successfully loaded 1 networks
DEBU[0000] Allocated lock 6 for container 7713c1bd0dcf26b7359a107e344b7f4f8564fb9c27d3f958a6f51c18707453ce
DEBU[0000] exporting opaque data as blob "sha256:65ad0d468eb1c558bf7f4e64e790f586e9eda649ee9f130cd0e835b292bbc5ac"
DEBU[0000] Created container "7713c1bd0dcf26b7359a107e344b7f4f8564fb9c27d3f958a6f51c18707453ce"
DEBU[0000] Container "7713c1bd0dcf26b7359a107e344b7f4f8564fb9c27d3f958a6f51c18707453ce" has work directory "/home/podman/.local/share/containers/storage/overlay-containers/7713c1bd0dcf26b7359a107e344b7f4f8564fb9c27d3f958a6f51c18707453ce/userdata"
DEBU[0000] Container "7713c1bd0dcf26b7359a107e344b7f4f8564fb9c27d3f958a6f51c18707453ce" has run directory "/tmp/storage-run-1000/containers/overlay-containers/7713c1bd0dcf26b7359a107e344b7f4f8564fb9c27d3f958a6f51c18707453ce/userdata"
DEBU[0000] Not attaching to stdin
INFO[0000] Received shutdown.Stop(), terminating! PID=515
DEBU[0000] Enabling signal proxying
DEBU[0000] overlay: mount_data=lowerdir=/home/podman/.local/share/containers/storage/overlay/l/F2QTOXZ7YEPRV74Z6VWO6OUOQI,upperdir=/home/podman/.local/share/containers/storage/overlay/6305a866e2f9015f41a9c1396df7b64f4ca8c01b66761985c947843066b56124/diff,workdir=/home/podman/.local/share/containers/storage/overlay/6305a866e2f9015f41a9c1396df7b64f4ca8c01b66761985c947843066b56124/work
DEBU[0000] Made network namespace at /tmp/storage-run-1000/netns/netns-894216f4-72d2-876f-07ba-9d4d7ee50dfe for container 7713c1bd0dcf26b7359a107e344b7f4f8564fb9c27d3f958a6f51c18707453ce
DEBU[0000] Mounted container "7713c1bd0dcf26b7359a107e344b7f4f8564fb9c27d3f958a6f51c18707453ce" at "/home/podman/.local/share/containers/storage/overlay/6305a866e2f9015f41a9c1396df7b64f4ca8c01b66761985c947843066b56124/merged"
DEBU[0000] Created root filesystem for container 7713c1bd0dcf26b7359a107e344b7f4f8564fb9c27d3f958a6f51c18707453ce at /home/podman/.local/share/containers/storage/overlay/6305a866e2f9015f41a9c1396df7b64f4ca8c01b66761985c947843066b56124/merged
TRAC[0000] netavark command: printf '{"container_id":"7713c1bd0dcf26b7359a107e344b7f4f8564fb9c27d3f958a6f51c18707453ce","container_name":"laughing_borg","networks":{"podman":{"static_ips":["10.88.0.8"],"aliases":["7713c1bd0dcf"],"interface_name":"eth0"}},"network_info":{"podman":{"name":"podman","id":"2f259bab93aaaaa2542ba43ef33eb990d0999ee1b9924b557b7be53c0b7a1bb9","driver":"bridge","network_interface":"podman0","created":"2024-06-07T16:57:43.104440601Z","subnets":[{"subnet":"10.88.0.0/16","gateway":"10.88.0.1"}],"ipv6_enabled":false,"internal":false,"dns_enabled":false,"ipam_options":{"driver":"host-local"}}}}' | /usr/libexec/podman/netavark setup /tmp/storage-run-1000/netns/netns-894216f4-72d2-876f-07ba-9d4d7ee50dfe
DEBU[0000] Creating rootless network namespace at "/tmp/storage-run-1000/containers/networks/rootless-netns/rootless-netns"
DEBU[0000] pasta arguments: --config-net --pid /tmp/storage-run-1000/containers/networks/rootless-netns/rootless-netns-conn.pid --dns-forward 169.254.0.1 -t none -u none -T none -U none --no-map-gw --quiet --netns /tmp/storage-run-1000/containers/networks/rootless-netns/rootless-netns
DEBU[0000] The path of /etc/resolv.conf in the mount ns is "/etc/resolv.conf"
[DEBUG netavark::network::validation] "Validating network namespace..."
[DEBUG netavark::commands::setup] "Setting up..."
[DEBUG netavark::firewall] Forcibly using firewall driver nftables
[INFO netavark::firewall] Using nftables firewall driver
[TRACE netavark::network::netlink] send netlink packet: NetlinkMessage { header: NetlinkHeader { length: 32, message_type: 19, flags: 1541, sequence_number: 1, port_number: 0 }, payload: InnerMessage(SetLink(LinkMessage { header: LinkHeader { interface_family: Unspec, index: 1, link_layer_type: Netrom, flags: [Up], change_mask: [Up] }, attributes: [] })) }
[TRACE netavark::network::netlink] read netlink packet: NetlinkMessage { header: NetlinkHeader { length: 36, message_type: 2, flags: 256, sequence_number: 1, port_number: 546 }, payload: Error(ErrorMessage { code: None, header: [32, 0, 0, 0, 19, 0, 5, 6, 1, 0, 0, 0, 0, 0, 0, 0] }) }
[DEBUG netavark::network::bridge] Setup network podman
[DEBUG netavark::network::bridge] Container interface name: eth0 with IP addresses [10.88.0.8/16]
[DEBUG netavark::network::bridge] Bridge name: podman0 with IP addresses [10.88.0.1/16]
[DEBUG netavark::network::core_utils] Setting sysctl value for net.ipv4.ip_forward to 1
[TRACE netavark::network::netlink] send netlink packet: NetlinkMessage { header: NetlinkHeader { length: 44, message_type: 18, flags: 1, sequence_number: 1, port_number: 0 }, payload: InnerMessage(GetLink(LinkMessage { header: LinkHeader { interface_family: Unspec, index: 0, link_layer_type: Netrom, flags: [], change_mask: [] }, attributes: [IfName("podman0")] })) }
[TRACE netavark::network::netlink] read netlink packet: NetlinkMessage { header: NetlinkHeader { length: 64, message_type: 2, flags: 0, sequence_number: 1, port_number: 546 }, payload: Error(ErrorMessage { code: Some(-19), header: [44, 0, 0, 0, 18, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 0, 3, 0, 112, 111, 100, 109, 97, 110, 48, 0] }) }
[TRACE netavark::network::netlink] send netlink packet: NetlinkMessage { header: NetlinkHeader { length: 60, message_type: 16, flags: 1541, sequence_number: 2, port_number: 0 }, payload: InnerMessage(NewLink(LinkMessage { header: LinkHeader { interface_family: Unspec, index: 0, link_layer_type: Netrom, flags: [], change_mask: [] }, attributes: [LinkInfo([Kind(Bridge)]), IfName("podman0")] })) }
[TRACE netavark::network::netlink] read netlink packet: NetlinkMessage { header: NetlinkHeader { length: 36, message_type: 2, flags: 256, sequence_number: 2, port_number: 546 }, payload: Error(ErrorMessage { code: None, header: [60, 0, 0, 0, 16, 0, 5, 6, 2, 0, 0, 0, 0, 0, 0, 0] }) }
[TRACE netavark::network::netlink] send netlink packet: NetlinkMessage { header: NetlinkHeader { length: 44, message_type: 18, flags: 1, sequence_number: 3, port_number: 0 }, payload: InnerMessage(GetLink(LinkMessage { header: LinkHeader { interface_family: Unspec, index: 0, link_layer_type: Netrom, flags: [], change_mask: [] }, attributes: [IfName("podman0")] })) }
[TRACE netavark::network::netlink] read netlink packet: NetlinkMessage { header: NetlinkHeader { length: 1880, message_type: 16, flags: 0, sequence_number: 3, port_number: 546 }, payload: InnerMessage(NewLink(LinkMessage { header: LinkHeader { interface_family: Unspec, index: 3, link_layer_type: Ether, flags: [Broadcast, Multicast], change_mask: [] }, attributes: [IfName("podman0"), TxQueueLen(1000), OperState(Down), Mode(0), Mtu(1500), MinMtu(68), MaxMtu(65535), Group(0), Promiscuity(0), Other(DefaultNla { kind: 61, value: [0, 0, 0, 0] }), NumTxQueues(1), GsoMaxSegs(65535), GsoMaxSize(65536), Other(DefaultNla { kind: 58, value: [0, 0, 1, 0] }), Other(DefaultNla { kind: 63, value: [0, 0, 1, 0] }), Other(DefaultNla { kind: 64, value: [0, 0, 1, 0] }), Other(DefaultNla { kind: 59, value: [0, 0, 1, 0] }), Other(DefaultNla { kind: 60, value: [255, 255, 0, 0] }), NumRxQueues(1), Carrier(1), Qdisc("noop"), CarrierChanges(0), CarrierUpCount(0), CarrierDownCount(0), ProtoDown(0), Map(Map { memory_start: 0, memory_end: 0, base_address: 0, irq: 0, dma: 0, port: 0 }), Address([186, 217, 241, 171, 224, 26]), Broadcast([255, 255, 255, 255, 255, 255]), Stats64(Stats64 { rx_packets: 0, tx_packets: 0, rx_bytes: 0, tx_bytes: 0, rx_errors: 0, tx_errors: 0, rx_dropped: 0, tx_dropped: 0, multicast: 0, collisions: 0, rx_length_errors: 0, rx_over_errors: 0, rx_crc_errors: 0, rx_frame_errors: 0, rx_fifo_errors: 0, rx_missed_errors: 0, tx_aborted_errors: 0, tx_carrier_errors: 0, tx_fifo_errors: 0, tx_heartbeat_errors: 0, tx_window_errors: 0, rx_compressed: 0, tx_compressed: 0, rx_nohandler: 0, rx_otherhost_dropped: 0 }), Stats(Stats { rx_packets: 0, tx_packets: 0, rx_bytes: 0, tx_bytes: 0, rx_errors: 0, tx_errors: 0, rx_dropped: 0, tx_dropped: 0, multicast: 0, collisions: 0, rx_length_errors: 0, rx_over_errors: 0, rx_crc_errors: 0, rx_frame_errors: 0, rx_fifo_errors: 0, rx_missed_errors: 0, tx_aborted_errors: 0, tx_carrier_errors: 0, tx_fifo_errors: 0, tx_heartbeat_errors: 0, tx_window_errors: 0, rx_compressed: 0, tx_compressed: 0, rx_nohandler: 0 }), Xdp([Attached(None)]), LinkInfo([Kind(Bridge), Data(Bridge([HelloTimer(0), TcnTimer(0), TopologyChangeTimer(0), GcTimer(0), ForwardDelay(1499), HelloTime(199), MaxAge(1999), AgeingTime(29999), StpState(0), Priority(32768), VlanFiltering(0), GroupFwdMask(0), BridgeId((128, [0, 0, 0, 0, 0, 0])), RootId((128, [0, 0, 0, 0, 0, 0])), RootPort(0), RootPathCost(0), TopologyChange(0), TopologyChangeDetected(0), GroupAddr([1, 128, 194, 0, 0, 0]), MultiBoolOpt(30064771072), Other(DefaultNla { kind: 48, value: [0, 0, 0, 0] }), Other(DefaultNla { kind: 49, value: [0, 0, 0, 0] }), VlanProtocol(33024), VlanDefaultPvid(1), VlanStatsEnabled(0), VlanStatsPerHost(0), MulticastRouter(1), MulticastSnooping(1), MulticastQueryUseIfaddr(0), MulticastQuerier(0), MulticastStatsEnabled(0), MulticastHashElasticity(16), MulticastHashMax(4096), MulticastLastMemberCount(2), MulticastStartupQueryCount(2), MulticastIgmpVersion(2), MulticastMldVersion(1), MulticastLastMemberInterval(99), MulticastMembershipInterval(25999), MulticastQuerierInterval(25499), MulticastQueryInterval(12499), MulticastQueryResponseInterval(999), MulticastStartupQueryInterval(3124), NfCallIpTables(0), NfCallIp6Tables(0), NfCallArpTables(0)]))]), AfSpecUnspec([Inet([DevConf(InetDevConf { forwarding: 1, mc_forwarding: 0, proxy_arp: 0, accept_redirects: 1, secure_redirects: 1, send_redirects: 1, shared_media: 1, rp_filter: 2, accept_source_route: 0, bootp_relay: 0, log_martians: 0, tag: 0, arpfilter: 0, medium_id: 0, noxfrm: 0, nopolicy: 0, force_igmp_version: 0, arp_announce: 0, arp_ignore: 0, promote_secondaries: 1, arp_accept: 0, arp_notify: 0, accept_local: 0, src_vmark: 0, proxy_arp_pvlan: 0, route_localnet: 0, igmpv2_unsolicited_report_interval: 10000, igmpv3_unsolicited_report_interval: 1000, ignore_routes_with_linkdown: 0, drop_unicast_in_l2_multicast: 0, drop_gratuitous_arp: 0, bc_forwarding: 0, arp_evict_nocarrier: 1 })]), Inet6([Flags(Inet6IfaceFlags([])), CacheInfo(Inet6CacheInfo { max_reasm_len: 65535, tstamp: 209915975, reachable_time: 25374, retrans_time: 1000 }), DevConf(Inet6DevConf { forwarding: 0, hoplimit: 64, mtu6: 1500, accept_ra: 1, accept_redirects: 1, autoconf: 1, dad_transmits: 1, rtr_solicits: -1, rtr_solicit_interval: 4000, rtr_solicit_delay: 1000, use_tempaddr: 0, temp_valid_lft: 604800, temp_prefered_lft: 86400, regen_max_retry: 3, max_desync_factor: 600, max_addresses: 16, force_mld_version: 0, accept_ra_defrtr: 1, accept_ra_pinfo: 1, accept_ra_rtr_pref: 1, rtr_probe_interval: 60000, accept_ra_rt_info_max_plen: 0, proxy_ndp: 0, optimistic_dad: 0, accept_source_route: 0, mc_forwarding: 0, disable_ipv6: 0, accept_dad: 1, force_tllao: 0, ndisc_notify: 0, mldv1_unsolicited_report_interval: 10000, mldv2_unsolicited_report_interval: 1000, suppress_frag_ndisc: 1, accept_ra_from_local: 0, use_optimistic: 0, accept_ra_mtu: 1, stable_secret: 0, use_oif_addrs_only: 0, accept_ra_min_hop_limit: 1, ignore_routes_with_linkdown: 0, drop_unicast_in_l2_multicast: 0, drop_unsolicited_na: 0, keep_addr_on_down: 0, rtr_solicit_max_interval: 3600000, seg6_enabled: 0, seg6_require_hmac: 0, enhanced_dad: 1, addr_gen_mode: 0, disable_policy: 0, accept_ra_rt_info_min_plen: 0, ndisc_tclass: 0, rpl_seg_enabled: 0, ra_defrtr_metric: 1024, ioam6_enabled: 0, ioam6_id: 65535, ioam6_id_wide: -1, ndisc_evict_nocarrier: 1, accept_untracked_na: 0, accept_ra_min_lft: 0 }), Stats(Inet6Stats { num: 38, in_pkts: 0, in_octets: 0, in_delivers: 0, out_forw_datagrams: 0, out_pkts: 0, out_octets: 0, in_hdr_errors: 0, in_too_big_errors: 0, in_no_routes: 0, in_addr_errors: 0, in_unknown_protos: 0, in_truncated_pkts: 0, in_discards: 0, out_discards: 0, out_no_routes: 0, reasm_timeout: 0, reasm_reqds: 0, reasm_oks: 0, reasm_fails: 0, frag_oks: 0, frag_fails: 0, frag_creates: 0, in_mcast_pkts: 0, out_mcast_pkts: 0, in_bcast_pkts: 0, out_bcast_pkts: 0, in_mcast_octets: 0, out_mcast_octets: 0, in_bcast_octets: 0, out_bcast_octets: 0, in_csum_errors: 0, in_no_ect_pkts: 0, in_ect1_pkts: 0, in_ect0_pkts: 0, in_ce_pkts: 0 }), Icmp6Stats(Icmp6Stats { num: 7, in_msgs: 0, in_errors: 0, out_msgs: 0, out_errors: 0, csum_errors: 0 }), Token(::), AddrGenMode(0)])]), Other(DefaultNla { kind: 32830, value: [] }), Other(DefaultNla { kind: 32833, value: [] })] })) }
[TRACE netavark::network::netlink] send netlink packet: NetlinkMessage { header: NetlinkHeader { length: 40, message_type: 20, flags: 1541, sequence_number: 4, port_number: 0 }, payload: InnerMessage(NewAddress(AddressMessage { header: AddressHeader { family: Inet, prefix_len: 16, flags: [], scope: Universe, index: 3 }, attributes: [Broadcast(10.88.255.255), Local(10.88.0.1)] })) }
[TRACE netavark::network::netlink] read netlink packet: NetlinkMessage { header: NetlinkHeader { length: 36, message_type: 2, flags: 256, sequence_number: 4, port_number: 546 }, payload: Error(ErrorMessage { code: None, header: [40, 0, 0, 0, 20, 0, 5, 6, 4, 0, 0, 0, 0, 0, 0, 0] }) }
[TRACE netavark::network::netlink] send netlink packet: NetlinkMessage { header: NetlinkHeader { length: 32, message_type: 19, flags: 1541, sequence_number: 5, port_number: 0 }, payload: InnerMessage(SetLink(LinkMessage { header: LinkHeader { interface_family: Unspec, index: 3, link_layer_type: Netrom, flags: [Up], change_mask: [Up] }, attributes: [] })) }
[TRACE netavark::network::netlink] read netlink packet: NetlinkMessage { header: NetlinkHeader { length: 36, message_type: 2, flags: 256, sequence_number: 5, port_number: 546 }, payload: Error(ErrorMessage { code: None, header: [32, 0, 0, 0, 19, 0, 5, 6, 5, 0, 0, 0, 0, 0, 0, 0] }) }
[TRACE netavark::network::netlink] send netlink packet: NetlinkMessage { header: NetlinkHeader { length: 116, message_type: 16, flags: 1541, sequence_number: 6, port_number: 0 }, payload: InnerMessage(NewLink(LinkMessage { header: LinkHeader { interface_family: Unspec, index: 0, link_layer_type: Netrom, flags: [], change_mask: [] }, attributes: [LinkInfo([Kind(Veth), Data(Veth(Peer(LinkMessage { header: LinkHeader { interface_family: Unspec, index: 0, link_layer_type: Netrom, flags: [], change_mask: [] }, attributes: [LinkInfo([Kind(Veth)]), IfName("eth0"), NetNsFd(3)] })))]), Controller(3)] })) }
[TRACE netavark::network::netlink] read netlink packet: NetlinkMessage { header: NetlinkHeader { length: 36, message_type: 2, flags: 256, sequence_number: 6, port_number: 546 }, payload: Error(ErrorMessage { code: None, header: [116, 0, 0, 0, 16, 0, 5, 6, 6, 0, 0, 0, 0, 0, 0, 0] }) }
[TRACE netavark::network::netlink] send netlink packet: NetlinkMessage { header: NetlinkHeader { length: 44, message_type: 18, flags: 1, sequence_number: 2, port_number: 0 }, payload: InnerMessage(GetLink(LinkMessage { header: LinkHeader { interface_family: Unspec, index: 0, link_layer_type: Netrom, flags: [], change_mask: [] }, attributes: [IfName("eth0")] })) }
[TRACE netavark::network::netlink] read netlink packet: NetlinkMessage { header: NetlinkHeader { length: 1468, message_type: 16, flags: 0, sequence_number: 2, port_number: 546 }, payload: InnerMessage(NewLink(LinkMessage { header: LinkHeader { interface_family: Unspec, index: 2, link_layer_type: Ether, flags: [Broadcast, Multicast], change_mask: [] }, attributes: [IfName("eth0"), TxQueueLen(1000), OperState(Down), Mode(0), Mtu(1500), MinMtu(68), MaxMtu(65535), Group(0), Promiscuity(0), Other(DefaultNla { kind: 61, value: [0, 0, 0, 0] }), NumTxQueues(64), GsoMaxSegs(65535), GsoMaxSize(65536), Other(DefaultNla { kind: 58, value: [0, 0, 1, 0] }), Other(DefaultNla { kind: 63, value: [0, 0, 1, 0] }), Other(DefaultNla { kind: 64, value: [0, 0, 1, 0] }), Other(DefaultNla { kind: 59, value: [248, 255, 7, 0] }), Other(DefaultNla { kind: 60, value: [255, 255, 0, 0] }), NumRxQueues(64), Carrier(0), Qdisc("noop"), CarrierChanges(1), CarrierUpCount(0), CarrierDownCount(1), ProtoDown(0), Map(Map { memory_start: 0, memory_end: 0, base_address: 0, irq: 0, dma: 0, port: 0 }), Address([70, 53, 181, 98, 53, 139]), Broadcast([255, 255, 255, 255, 255, 255]), Stats64(Stats64 { rx_packets: 0, tx_packets: 0, rx_bytes: 0, tx_bytes: 0, rx_errors: 0, tx_errors: 0, rx_dropped: 0, tx_dropped: 0, multicast: 0, collisions: 0, rx_length_errors: 0, rx_over_errors: 0, rx_crc_errors: 0, rx_frame_errors: 0, rx_fifo_errors: 0, rx_missed_errors: 0, tx_aborted_errors: 0, tx_carrier_errors: 0, tx_fifo_errors: 0, tx_heartbeat_errors: 0, tx_window_errors: 0, rx_compressed: 0, tx_compressed: 0, rx_nohandler: 0, rx_otherhost_dropped: 0 }), Stats(Stats { rx_packets: 0, tx_packets: 0, rx_bytes: 0, tx_bytes: 0, rx_errors: 0, tx_errors: 0, rx_dropped: 0, tx_dropped: 0, multicast: 0, collisions: 0, rx_length_errors: 0, rx_over_errors: 0, rx_crc_errors: 0, rx_frame_errors: 0, rx_fifo_errors: 0, rx_missed_errors: 0, tx_aborted_errors: 0, tx_carrier_errors: 0, tx_fifo_errors: 0, tx_heartbeat_errors: 0, tx_window_errors: 0, rx_compressed: 0, tx_compressed: 0, rx_nohandler: 0 }), Xdp([Attached(None)]), LinkInfo([Kind(Veth)]), NetnsId(0), Link(4), AfSpecUnspec([Inet([DevConf(InetDevConf { forwarding: 1, mc_forwarding: 0, proxy_arp: 0, accept_redirects: 1, secure_redirects: 1, send_redirects: 1, shared_media: 1, rp_filter: 2, accept_source_route: 0, bootp_relay: 0, log_martians: 0, tag: 0, arpfilter: 0, medium_id: 0, noxfrm: 0, nopolicy: 0, force_igmp_version: 0, arp_announce: 0, arp_ignore: 0, promote_secondaries: 1, arp_accept: 0, arp_notify: 0, accept_local: 0, src_vmark: 0, proxy_arp_pvlan: 0, route_localnet: 0, igmpv2_unsolicited_report_interval: 10000, igmpv3_unsolicited_report_interval: 1000, ignore_routes_with_linkdown: 0, drop_unicast_in_l2_multicast: 0, drop_gratuitous_arp: 0, bc_forwarding: 0, arp_evict_nocarrier: 1 })]), Inet6([Flags(Inet6IfaceFlags([])), CacheInfo(Inet6CacheInfo { max_reasm_len: 65535, tstamp: 209915976, reachable_time: 25047, retrans_time: 1000 }), DevConf(Inet6DevConf { forwarding: 0, hoplimit: 64, mtu6: 1500, accept_ra: 1, accept_redirects: 1, autoconf: 1, dad_transmits: 1, rtr_solicits: -1, rtr_solicit_interval: 4000, rtr_solicit_delay: 1000, use_tempaddr: 0, temp_valid_lft: 604800, temp_prefered_lft: 86400, regen_max_retry: 3, max_desync_factor: 600, max_addresses: 16, force_mld_version: 0, accept_ra_defrtr: 1, accept_ra_pinfo: 1, accept_ra_rtr_pref: 1, rtr_probe_interval: 60000, accept_ra_rt_info_max_plen: 0, proxy_ndp: 0, optimistic_dad: 0, accept_source_route: 0, mc_forwarding: 0, disable_ipv6: 0, accept_dad: 1, force_tllao: 0, ndisc_notify: 0, mldv1_unsolicited_report_interval: 10000, mldv2_unsolicited_report_interval: 1000, suppress_frag_ndisc: 1, accept_ra_from_local: 0, use_optimistic: 0, accept_ra_mtu: 1, stable_secret: 0, use_oif_addrs_only: 0, accept_ra_min_hop_limit: 1, ignore_routes_with_linkdown: 0, drop_unicast_in_l2_multicast: 0, drop_unsolicited_na: 0, keep_addr_on_down: 0, rtr_solicit_max_interval: 3600000, seg6_enabled: 0, seg6_require_hmac: 0, enhanced_dad: 1, addr_gen_mode: 0, disable_policy: 0, accept_ra_rt_info_min_plen: 0, ndisc_tclass: 0, rpl_seg_enabled: 0, ra_defrtr_metric: 1024, ioam6_enabled: 0, ioam6_id: 65535, ioam6_id_wide: -1, ndisc_evict_nocarrier: 1, accept_untracked_na: 0, accept_ra_min_lft: 0 }), Stats(Inet6Stats { num: 38, in_pkts: 0, in_octets: 0, in_delivers: 0, out_forw_datagrams: 0, out_pkts: 0, out_octets: 0, in_hdr_errors: 0, in_too_big_errors: 0, in_no_routes: 0, in_addr_errors: 0, in_unknown_protos: 0, in_truncated_pkts: 0, in_discards: 0, out_discards: 0, out_no_routes: 0, reasm_timeout: 0, reasm_reqds: 0, reasm_oks: 0, reasm_fails: 0, frag_oks: 0, frag_fails: 0, frag_creates: 0, in_mcast_pkts: 0, out_mcast_pkts: 0, in_bcast_pkts: 0, out_bcast_pkts: 0, in_mcast_octets: 0, out_mcast_octets: 0, in_bcast_octets: 0, out_bcast_octets: 0, in_csum_errors: 0, in_no_ect_pkts: 0, in_ect1_pkts: 0, in_ect0_pkts: 0, in_ce_pkts: 0 }), Icmp6Stats(Icmp6Stats { num: 7, in_msgs: 0, in_errors: 0, out_msgs: 0, out_errors: 0, csum_errors: 0 }), Token(::), AddrGenMode(0)])]), Other(DefaultNla { kind: 32830, value: [] }), Other(DefaultNla { kind: 32833, value: [] })] })) }
[DEBUG netavark::network::core_utils] Setting sysctl value for /proc/sys/net/ipv6/conf/eth0/autoconf to 0
[DEBUG netavark::network::core_utils] Setting sysctl value for /proc/sys/net/ipv4/conf/eth0/arp_notify to 1
DEBU[0000] Cleaning up rootless network namespace
DEBU[0000] Unmounted container "7713c1bd0dcf26b7359a107e344b7f4f8564fb9c27d3f958a6f51c18707453ce"
DEBU[0000] Network is already cleaned up, skipping...
DEBU[0000] Cleaning up container 7713c1bd0dcf26b7359a107e344b7f4f8564fb9c27d3f958a6f51c18707453ce
DEBU[0000] Network is already cleaned up, skipping...
DEBU[0000] Container 7713c1bd0dcf26b7359a107e344b7f4f8564fb9c27d3f958a6f51c18707453ce storage is already unmounted, skipping...
DEBU[0000] ExitCode msg: "netavark (exit code 1): sysctl error: io error: read-only file system (os error 30)"
Error: netavark (exit code 1): Sysctl error: IO Error: Read-only file system (os error 30)
DEBU[0000] Shutting down engines
[1]: https://www.redhat.com/sysadmin/podman-inside-kubernetes [2]: https://www.redhat.com/sysadmin/podman-inside-container [3]: https://devconfcz2023.sched.com/event/1MYld/root-is-less-container-networks-get-in-shape-with-pasta [4]: https://github.com/containers/podman/blob/main/docs/tutorials/basic_networking.md [5]: https://github.com/containers/netavark/issues/825#issuecomment-1944315150 [6]: https://github.com/containers/netavark/blob/febe31a08266ab8025c7aaf567f01a5335732b3f/src/network/core_utils.rs#L258
EDIT: the above was run on Kubernetes 1.29.4
with node kernel 6.8.9
and podman/stable:v5.0.3
.
Hi, I'm facing some version of @Omar007's issue as well. With procMount: Unmasked
in the container security context, I was able to get a working bridge network in rootless podman inside k8s with minimal friction (podman 4.9.3).
However, as soon as I attempted to expose one of the container ports on localhost, the container failed to come up as netavark tries to set net.ipv4.conf.<interface>.route_localnet=1
and it can't write to /proc
in the rootless netns. The issue goes away if I make the podman
container a (rootless) privileged one.
My questions:
podman unshare --rootless-netns sysctl -w net.ipv4.conf.default.route_localnet=1
in a privileged init container with the same UID as my main one to "preset" the sysctl value (even tried it while persisting the contents of $XDG_RUNTIME_DIR
across both containers), but the sysctl change did not survive in the main container. I don't know enough about namespacing behaviour to judge whether that's unavoidable or not--IIRC this kind of trick does work for making changes to sysctls in the k8s pod's own namespace.If you think it's meaningful to do so I can make another reproduction attempt with a more recent podman version (the only reason why I tested with 4.9.3 was because it's part of another image in our setup).
Here's some context on what we're even trying to achieve here (X/Y problems and all that): we have a bunch of code using testcontainers
for integration tests that currently run on EC2 instances with Docker. We'd like to "lift and shift" all of that into our new CI system that will (likely) only deal with runners in Kubernetes, while following the principle of least privilege as much as possible. Running rootless podman in k8s is one of the avenues we're exploring. Almost every single one of these test setups currently uses Docker's "built-in" port forwarding as its main means of communicating with the containerised services in the test. At the same time, the services also communicate with one another, so just putting them all behind slirp4netns
and skipping the bridge network is not really an option => we'd like to have the containers in a bridge network and have ports forwarded to the k8s pod's localhost so the test runner can communicate with them. So far, I haven't found a way to achieve that in rootless mode without privileged: true
, hence this post.
@mvalvekensCET how are you using procMount: Unmasked
? Could you share a working spec for that? Or are you by chance on a k8s version <= 1.29?
The issue I run into when trying to set it to unmasked is that this also requires the use of user namespaces (spec.hostUsers: false
, not enforced in k8s <=1.29 but since my last post moved to 1.30). That in turn makes both /dev/fuse
as well as /dev/net/tun
, which are mounted using a device plugin, inaccessible as things like that are not (yet) compatible with user namespaces in k8s. And ofc. that will very much prevent things from working ;)
When running inside of unprivileged containers /proc is normally mounted read only. Now if a users tries to run netavark it will fail hard if we cannot set all the sysctl's. Most of them are needed for routing or to disable some ipv6 options but general communication may still be possible.
We should consider not treating read only errors as fatal and just log them as warning. The biggest problem is likely the ip_forward sysctl, without it no external communication would be possible. However this could already be set by the outer container manager in which case I would expect it to mostly work fine.
see https://github.com/containers/podman/issues/19991