Closed kautsig closed 6 months ago
Can you run the following one-liner on affected infrastructure and provide the full output?
docker run -it --rm --privileged docker:dind sh -euxc 'modprobe nf_tables > /dev/null 2>&1 || :; if ! iptables -nL > /dev/null 2>&1; then modprobe ip_tables || :; /usr/local/sbin/.iptables-legacy/iptables -nL > /dev/null 2>&1; echo success legacy; else echo success nftables; fi'
It should look something like this:
$ docker run -it --rm --privileged docker:dind sh -euxc 'modprobe nf_tables > /dev/null 2>&1 || :; if ! iptables -nL > /dev/null 2>&1; then modprobe ip_tables || :; /usr/local/sbin/.iptables-legacy/iptables -nL > /dev/null 2>&1; echo success legacy; else echo success nftables; fi'
+ modprobe nf_tables
+ :
+ iptables -nL
+ echo success nftables
success nftables
or:
$ docker run -it --rm --privileged docker:dind sh -euxc 'modprobe nf_tables > /dev/null 2>&1 || :; if ! false iptables -nL > /dev/null 2>&1; then modprobe ip_tables || :; /usr/local/sbin/.iptables-legacy/iptables -nL > /dev/null 2>&1; echo success legacy; else echo success nftables; fi'
+ modprobe nf_tables
+ :
+ iptables -nL
+ modprobe ip_tables
ip: can't find device 'ip_tables'
ip_tables 36864 0
x_tables 53248 8 ip_tables,xt_mark,xt_nat,xt_tcpudp,xt_conntrack,xt_MASQUERADE,xt_addrtype,nft_compat
modprobe: can't change directory to '/lib/modules': No such file or directory
+ :
+ /usr/local/sbin/.iptables-legacy/iptables -nL
+ echo success legacy
success legacy
Actually, instead, could someone test https://github.com/docker-library/docker/pull/468? :eyes:
docker build --pull 'https://github.com/docker-library/docker.git#refs/pull/468/merge:24/dind'
I tried the fix on a COS 105 instance, without success unfortunately:
docker build --pull 'https://github.com/docker-library/docker.git#refs/pull/468/merge:24/dind' -t docker:dind-468
instance-1 ~ # docker run --rm -ti --privileged --name docker -e DOCKER_TLS_CERTDIR= -p 2375:2375 docker:dind-468
INFO[2023-12-19T05:59:24.658264543Z] Starting up
WARN[2023-12-19T05:59:24.659459317Z] Binding to IP address without --tlsverify is insecure and gives root access on this machine to everyone who has access to your network. host="tcp://0.0.0.0:2375"
WARN[2023-12-19T05:59:24.659493125Z] Binding to an IP address, even on localhost, can also give access to scripts run in a browser. Be safe out there! host="tcp://0.0.0.0:2375"
WARN[2023-12-19T05:59:25.659748393Z] Binding to an IP address without --tlsverify is deprecated. Startup is intentionally being slowed down to show this message host="tcp://0.0.0.0:2375"
WARN[2023-12-19T05:59:25.659878198Z] Please consider generating tls certificates with client validation to prevent exposing unauthenticated root access to your network host="tcp://0.0.0.0:2375"
WARN[2023-12-19T05:59:25.659977660Z] You can override this by explicitly specifying '--tls=false' or '--tlsverify=false' host="tcp://0.0.0.0:2375"
WARN[2023-12-19T05:59:25.660055289Z] Support for listening on TCP without authentication or explicit intent to run without authentication will be removed in the next release host="tcp://0.0.0.0:2375"
WARN[2023-12-19T05:59:40.667664802Z] could not change group /var/run/docker.sock to docker: group docker not found
INFO[2023-12-19T05:59:40.667892024Z] containerd not running, starting managed containerd
INFO[2023-12-19T05:59:40.669552030Z] started new containerd process address=/var/run/docker/containerd/containerd.sock module=libcontainerd pid=29
INFO[2023-12-19T05:59:40.694517584Z] starting containerd revision=091922f03c2762540fd057fba91260237ff86acb version=v1.7.6
INFO[2023-12-19T05:59:40.720115552Z] loading plugin "io.containerd.snapshotter.v1.aufs"... type=io.containerd.snapshotter.v1
INFO[2023-12-19T05:59:40.726019496Z] skip loading plugin "io.containerd.snapshotter.v1.aufs"... error="aufs is not supported (modprobe aufs failed: exit status 1 \"ip: can't find device 'aufs'\\nmodprobe: can't change directory to '/lib/modules': No such file or directory\\n\"): skip plugin" type=io.containerd.snapshotter.v1
INFO[2023-12-19T05:59:40.726070113Z] loading plugin "io.containerd.content.v1.content"... type=io.containerd.content.v1
INFO[2023-12-19T05:59:40.726330058Z] loading plugin "io.containerd.snapshotter.v1.blockfile"... type=io.containerd.snapshotter.v1
INFO[2023-12-19T05:59:40.726503818Z] skip loading plugin "io.containerd.snapshotter.v1.blockfile"... error="no scratch file generator: skip plugin" type=io.containerd.snapshotter.v1
INFO[2023-12-19T05:59:40.726533875Z] loading plugin "io.containerd.snapshotter.v1.native"... type=io.containerd.snapshotter.v1
INFO[2023-12-19T05:59:40.726995644Z] loading plugin "io.containerd.snapshotter.v1.overlayfs"... type=io.containerd.snapshotter.v1
INFO[2023-12-19T05:59:40.727713599Z] loading plugin "io.containerd.snapshotter.v1.devmapper"... type=io.containerd.snapshotter.v1
WARN[2023-12-19T05:59:40.728365934Z] failed to load plugin io.containerd.snapshotter.v1.devmapper error="devmapper not configured"
INFO[2023-12-19T05:59:40.728406950Z] loading plugin "io.containerd.snapshotter.v1.zfs"... type=io.containerd.snapshotter.v1
INFO[2023-12-19T05:59:40.728686904Z] skip loading plugin "io.containerd.snapshotter.v1.zfs"... error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1
INFO[2023-12-19T05:59:40.728727046Z] loading plugin "io.containerd.metadata.v1.bolt"... type=io.containerd.metadata.v1
WARN[2023-12-19T05:59:40.728839826Z] could not use snapshotter devmapper in metadata plugin error="devmapper not configured"
INFO[2023-12-19T05:59:40.728873925Z] metadata content store policy set policy=shared
INFO[2023-12-19T05:59:40.734935163Z] loading plugin "io.containerd.differ.v1.walking"... type=io.containerd.differ.v1
INFO[2023-12-19T05:59:40.734982469Z] loading plugin "io.containerd.event.v1.exchange"... type=io.containerd.event.v1
INFO[2023-12-19T05:59:40.735018284Z] loading plugin "io.containerd.gc.v1.scheduler"... type=io.containerd.gc.v1
INFO[2023-12-19T05:59:40.735074579Z] loading plugin "io.containerd.lease.v1.manager"... type=io.containerd.lease.v1
INFO[2023-12-19T05:59:40.735111482Z] loading plugin "io.containerd.nri.v1.nri"... type=io.containerd.nri.v1
INFO[2023-12-19T05:59:40.735159430Z] NRI interface is disabled by configuration.
INFO[2023-12-19T05:59:40.735186619Z] loading plugin "io.containerd.runtime.v2.task"... type=io.containerd.runtime.v2
INFO[2023-12-19T05:59:40.735637548Z] loading plugin "io.containerd.runtime.v2.shim"... type=io.containerd.runtime.v2
INFO[2023-12-19T05:59:40.735682478Z] loading plugin "io.containerd.sandbox.store.v1.local"... type=io.containerd.sandbox.store.v1
INFO[2023-12-19T05:59:40.735715378Z] loading plugin "io.containerd.sandbox.controller.v1.local"... type=io.containerd.sandbox.controller.v1
INFO[2023-12-19T05:59:40.735748144Z] loading plugin "io.containerd.streaming.v1.manager"... type=io.containerd.streaming.v1
INFO[2023-12-19T05:59:40.735779675Z] loading plugin "io.containerd.service.v1.introspection-service"... type=io.containerd.service.v1
INFO[2023-12-19T05:59:40.735809789Z] loading plugin "io.containerd.service.v1.containers-service"... type=io.containerd.service.v1
INFO[2023-12-19T05:59:40.735833907Z] loading plugin "io.containerd.service.v1.content-service"... type=io.containerd.service.v1
INFO[2023-12-19T05:59:40.735865393Z] loading plugin "io.containerd.service.v1.diff-service"... type=io.containerd.service.v1
INFO[2023-12-19T05:59:40.735897986Z] loading plugin "io.containerd.service.v1.images-service"... type=io.containerd.service.v1
INFO[2023-12-19T05:59:40.735930671Z] loading plugin "io.containerd.service.v1.namespaces-service"... type=io.containerd.service.v1
INFO[2023-12-19T05:59:40.735977903Z] loading plugin "io.containerd.service.v1.snapshots-service"... type=io.containerd.service.v1
INFO[2023-12-19T05:59:40.736038416Z] loading plugin "io.containerd.runtime.v1.linux"... type=io.containerd.runtime.v1
INFO[2023-12-19T05:59:40.736316165Z] loading plugin "io.containerd.monitor.v1.cgroups"... type=io.containerd.monitor.v1
INFO[2023-12-19T05:59:40.736859068Z] loading plugin "io.containerd.service.v1.tasks-service"... type=io.containerd.service.v1
INFO[2023-12-19T05:59:40.736916534Z] loading plugin "io.containerd.grpc.v1.introspection"... type=io.containerd.grpc.v1
INFO[2023-12-19T05:59:40.736950550Z] loading plugin "io.containerd.transfer.v1.local"... type=io.containerd.transfer.v1
INFO[2023-12-19T05:59:40.737000865Z] loading plugin "io.containerd.internal.v1.restart"... type=io.containerd.internal.v1
INFO[2023-12-19T05:59:40.737158933Z] loading plugin "io.containerd.grpc.v1.containers"... type=io.containerd.grpc.v1
INFO[2023-12-19T05:59:40.737196150Z] loading plugin "io.containerd.grpc.v1.content"... type=io.containerd.grpc.v1
INFO[2023-12-19T05:59:40.737228667Z] loading plugin "io.containerd.grpc.v1.diff"... type=io.containerd.grpc.v1
INFO[2023-12-19T05:59:40.737266643Z] loading plugin "io.containerd.grpc.v1.events"... type=io.containerd.grpc.v1
INFO[2023-12-19T05:59:40.737314314Z] loading plugin "io.containerd.grpc.v1.healthcheck"... type=io.containerd.grpc.v1
INFO[2023-12-19T05:59:40.737392179Z] loading plugin "io.containerd.grpc.v1.images"... type=io.containerd.grpc.v1
INFO[2023-12-19T05:59:40.737423579Z] loading plugin "io.containerd.grpc.v1.leases"... type=io.containerd.grpc.v1
INFO[2023-12-19T05:59:40.737474850Z] loading plugin "io.containerd.grpc.v1.namespaces"... type=io.containerd.grpc.v1
INFO[2023-12-19T05:59:40.737508949Z] loading plugin "io.containerd.internal.v1.opt"... type=io.containerd.internal.v1
INFO[2023-12-19T05:59:40.737958122Z] loading plugin "io.containerd.grpc.v1.sandbox-controllers"... type=io.containerd.grpc.v1
INFO[2023-12-19T05:59:40.737994660Z] loading plugin "io.containerd.grpc.v1.sandboxes"... type=io.containerd.grpc.v1
INFO[2023-12-19T05:59:40.738018467Z] loading plugin "io.containerd.grpc.v1.snapshots"... type=io.containerd.grpc.v1
INFO[2023-12-19T05:59:40.738048956Z] loading plugin "io.containerd.grpc.v1.streaming"... type=io.containerd.grpc.v1
INFO[2023-12-19T05:59:40.738082779Z] loading plugin "io.containerd.grpc.v1.tasks"... type=io.containerd.grpc.v1
INFO[2023-12-19T05:59:40.738117133Z] loading plugin "io.containerd.grpc.v1.transfer"... type=io.containerd.grpc.v1
INFO[2023-12-19T05:59:40.738147744Z] loading plugin "io.containerd.grpc.v1.version"... type=io.containerd.grpc.v1
INFO[2023-12-19T05:59:40.738177772Z] loading plugin "io.containerd.tracing.processor.v1.otlp"... type=io.containerd.tracing.processor.v1
INFO[2023-12-19T05:59:40.738212895Z] skip loading plugin "io.containerd.tracing.processor.v1.otlp"... error="no OpenTelemetry endpoint: skip plugin" type=io.containerd.tracing.processor.v1
INFO[2023-12-19T05:59:40.738239689Z] loading plugin "io.containerd.internal.v1.tracing"... type=io.containerd.internal.v1
INFO[2023-12-19T05:59:40.738281419Z] skipping tracing processor initialization (no tracing plugin) error="no OpenTelemetry endpoint: skip plugin"
INFO[2023-12-19T05:59:40.738996889Z] serving... address=/var/run/docker/containerd/containerd-debug.sock
INFO[2023-12-19T05:59:40.739124469Z] serving... address=/var/run/docker/containerd/containerd.sock.ttrpc
INFO[2023-12-19T05:59:40.739482037Z] serving... address=/var/run/docker/containerd/containerd.sock
INFO[2023-12-19T05:59:40.739761886Z] containerd successfully booted in 0.045963s
INFO[2023-12-19T05:59:40.783744615Z] Loading containers: start.
INFO[2023-12-19T05:59:40.841116945Z] stopping healthcheck following graceful shutdown module=libcontainerd
INFO[2023-12-19T05:59:40.841151984Z] stopping event stream following graceful shutdown error="context canceled" module=libcontainerd namespace=moby
INFO[2023-12-19T05:59:40.841192121Z] stopping event stream following graceful shutdown error="context canceled" module=libcontainerd namespace=plugins.moby
failed to start daemon: Error initializing network controller: error obtaining controller instance: unable to add return rule in DOCKER-ISOLATION-STAGE-1 chain: (iptables failed: iptables --wait -A DOCKER-ISOLATION-STAGE-1 -j RETURN: iptables v1.8.10 (nf_tables): RULE_APPEND failed (No such file or directory): rule in chain DOCKER-ISOLATION-STAGE-1
(exit status 4))
Same as before.
I had a look at the fix and checked from the host and container side which IP table chains are visible to the container - maybe I overlooked something in the first place.
1) Container, nf_tables
version
instance-1 ~ # docker run -it --rm --privileged docker:dind sh -euxc 'iptables --version'
+ iptables --version
iptables v1.8.10 (nf_tables)
instance-1 ~ # docker run -it --rm --privileged docker:dind sh -euxc 'iptables -nL'
+ iptables -nL
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
2) Container, legacy version
instance-1 ~ # docker run -it --rm --privileged docker:dind sh -euxc '/usr/local/sbin/.iptables-legacy/iptables --version'
+ /usr/local/sbin/.iptables-legacy/iptables --version
iptables v1.8.10 (legacy)
instance-1 ~ # docker run -it --rm --privileged docker:dind sh -euxc '/usr/local/sbin/.iptables-legacy/iptables -nL'
+ /usr/local/sbin/.iptables-legacy/iptables -nL
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
3) Host
instance-1 ~ # iptables --version
iptables v1.8.5 (legacy)
instance-1 ~ # iptables -nL
Chain INPUT (policy DROP)
target prot opt source destination
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0
ACCEPT icmp -- 0.0.0.0/0 0.0.0.0/0
ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:22
Chain FORWARD (policy DROP)
target prot opt source destination
DOCKER-USER all -- 0.0.0.0/0 0.0.0.0/0
DOCKER-ISOLATION-STAGE-1 all -- 0.0.0.0/0 0.0.0.0/0
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
DOCKER all -- 0.0.0.0/0 0.0.0.0/0
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0
Chain OUTPUT (policy DROP)
target prot opt source destination
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 state NEW,RELATED,ESTABLISHED
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0
Chain DOCKER (1 references)
target prot opt source destination
Chain DOCKER-ISOLATION-STAGE-1 (1 references)
target prot opt source destination
DOCKER-ISOLATION-STAGE-2 all -- 0.0.0.0/0 0.0.0.0/0
RETURN all -- 0.0.0.0/0 0.0.0.0/0
Chain DOCKER-ISOLATION-STAGE-2 (1 references)
target prot opt source destination
DROP all -- 0.0.0.0/0 0.0.0.0/0
RETURN all -- 0.0.0.0/0 0.0.0.0/0
Chain DOCKER-USER (1 references)
target prot opt source destination
RETURN all -- 0.0.0
This looks interesting, the container does not see a rule chain DOCKER-ISOLATION-STAGE-1
, no matter which version is used.
I also came across --network=host
flag when running the container. Which, when added:
Warning: iptables-legacy tables present, use iptables-legacy to see them
warning on the newer iptables versiondocker run --rm -ti --privileged --network=host --name docker -e DOCKER_TLS_CERTDIR= -p 2375:2375 docker:dind-468
WARNING: Published ports are discarded when using host network mode
INFO[2023-12-19T07:15:03.890696880Z] Starting up
WARN[2023-12-19T07:15:03.891579679Z] Binding to IP address without --tlsverify is insecure and gives root access on this machine to everyone who has access to your network. host="tcp://0.0.0.0:2375"
WARN[2023-12-19T07:15:03.891621586Z] Binding to an IP address, even on localhost, can also give access to scripts run in a browser. Be safe out there! host="tcp://0.0.0.0:2375"
WARN[2023-12-19T07:15:04.891832355Z] Binding to an IP address without --tlsverify is deprecated. Startup is intentionally being slowed down to show this message host="tcp://0.0.0.0:2375"
WARN[2023-12-19T07:15:04.891893977Z] Please consider generating tls certificates with client validation to prevent exposing unauthenticated root access to your network host="tcp://0.0.0.0:2375"
WARN[2023-12-19T07:15:04.891937385Z] You can override this by explicitly specifying '--tls=false' or '--tlsverify=false' host="tcp://0.0.0.0:2375"
WARN[2023-12-19T07:15:04.892038473Z] Support for listening on TCP without authentication or explicit intent to run without authentication will be removed in the next release host="tcp://0.0.0.0:2375"
WARN[2023-12-19T07:15:19.894050724Z] could not change group /var/run/docker.sock to docker: group docker not found
INFO[2023-12-19T07:15:19.894313539Z] containerd not running, starting managed containerd
INFO[2023-12-19T07:15:19.895833470Z] started new containerd process address=/var/run/docker/containerd/containerd.sock module=libcontainerd pid=28
INFO[2023-12-19T07:15:19.925366290Z] starting containerd revision=091922f03c2762540fd057fba91260237ff86acb version=v1.7.6
INFO[2023-12-19T07:15:19.948921789Z] loading plugin "io.containerd.snapshotter.v1.aufs"... type=io.containerd.snapshotter.v1
INFO[2023-12-19T07:15:19.955378329Z] skip loading plugin "io.containerd.snapshotter.v1.aufs"... error="aufs is not supported (modprobe aufs failed: exit status 1 \"ip: can't find device 'aufs'\\nmodprobe: can't change directory to '/lib/modules': No such file or directory\\n\"): skip plugin" type=io.containerd.snapshotter.v1
INFO[2023-12-19T07:15:19.955442479Z] loading plugin "io.containerd.content.v1.content"... type=io.containerd.content.v1
INFO[2023-12-19T07:15:19.955720195Z] loading plugin "io.containerd.snapshotter.v1.blockfile"... type=io.containerd.snapshotter.v1
INFO[2023-12-19T07:15:19.955898278Z] skip loading plugin "io.containerd.snapshotter.v1.blockfile"... error="no scratch file generator: skip plugin" type=io.containerd.snapshotter.v1
INFO[2023-12-19T07:15:19.955935065Z] loading plugin "io.containerd.snapshotter.v1.native"... type=io.containerd.snapshotter.v1
INFO[2023-12-19T07:15:19.956151100Z] loading plugin "io.containerd.snapshotter.v1.overlayfs"... type=io.containerd.snapshotter.v1
INFO[2023-12-19T07:15:19.956679897Z] loading plugin "io.containerd.snapshotter.v1.devmapper"... type=io.containerd.snapshotter.v1
WARN[2023-12-19T07:15:19.956735989Z] failed to load plugin io.containerd.snapshotter.v1.devmapper error="devmapper not configured"
INFO[2023-12-19T07:15:19.956760154Z] loading plugin "io.containerd.snapshotter.v1.zfs"... type=io.containerd.snapshotter.v1
INFO[2023-12-19T07:15:19.957109770Z] skip loading plugin "io.containerd.snapshotter.v1.zfs"... error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1
INFO[2023-12-19T07:15:19.957145593Z] loading plugin "io.containerd.metadata.v1.bolt"... type=io.containerd.metadata.v1
WARN[2023-12-19T07:15:19.957304461Z] could not use snapshotter devmapper in metadata plugin error="devmapper not configured"
INFO[2023-12-19T07:15:19.957332636Z] metadata content store policy set policy=shared
INFO[2023-12-19T07:15:19.967125710Z] loading plugin "io.containerd.differ.v1.walking"... type=io.containerd.differ.v1
INFO[2023-12-19T07:15:19.967737508Z] loading plugin "io.containerd.event.v1.exchange"... type=io.containerd.event.v1
INFO[2023-12-19T07:15:19.968081304Z] loading plugin "io.containerd.gc.v1.scheduler"... type=io.containerd.gc.v1
INFO[2023-12-19T07:15:19.968450452Z] loading plugin "io.containerd.lease.v1.manager"... type=io.containerd.lease.v1
INFO[2023-12-19T07:15:19.968776981Z] loading plugin "io.containerd.nri.v1.nri"... type=io.containerd.nri.v1
INFO[2023-12-19T07:15:19.969064363Z] NRI interface is disabled by configuration.
INFO[2023-12-19T07:15:19.969519180Z] loading plugin "io.containerd.runtime.v2.task"... type=io.containerd.runtime.v2
INFO[2023-12-19T07:15:19.969979784Z] loading plugin "io.containerd.runtime.v2.shim"... type=io.containerd.runtime.v2
INFO[2023-12-19T07:15:19.970013140Z] loading plugin "io.containerd.sandbox.store.v1.local"... type=io.containerd.sandbox.store.v1
INFO[2023-12-19T07:15:19.970043442Z] loading plugin "io.containerd.sandbox.controller.v1.local"... type=io.containerd.sandbox.controller.v1
INFO[2023-12-19T07:15:19.970080353Z] loading plugin "io.containerd.streaming.v1.manager"... type=io.containerd.streaming.v1
INFO[2023-12-19T07:15:19.970134746Z] loading plugin "io.containerd.service.v1.introspection-service"... type=io.containerd.service.v1
INFO[2023-12-19T07:15:19.970165321Z] loading plugin "io.containerd.service.v1.containers-service"... type=io.containerd.service.v1
INFO[2023-12-19T07:15:19.970214786Z] loading plugin "io.containerd.service.v1.content-service"... type=io.containerd.service.v1
INFO[2023-12-19T07:15:19.970255497Z] loading plugin "io.containerd.service.v1.diff-service"... type=io.containerd.service.v1
INFO[2023-12-19T07:15:19.970289470Z] loading plugin "io.containerd.service.v1.images-service"... type=io.containerd.service.v1
INFO[2023-12-19T07:15:19.970319797Z] loading plugin "io.containerd.service.v1.namespaces-service"... type=io.containerd.service.v1
INFO[2023-12-19T07:15:19.970349451Z] loading plugin "io.containerd.service.v1.snapshots-service"... type=io.containerd.service.v1
INFO[2023-12-19T07:15:19.970375091Z] loading plugin "io.containerd.runtime.v1.linux"... type=io.containerd.runtime.v1
INFO[2023-12-19T07:15:19.970712279Z] loading plugin "io.containerd.monitor.v1.cgroups"... type=io.containerd.monitor.v1
INFO[2023-12-19T07:15:19.971371985Z] loading plugin "io.containerd.service.v1.tasks-service"... type=io.containerd.service.v1
INFO[2023-12-19T07:15:19.971445967Z] loading plugin "io.containerd.grpc.v1.introspection"... type=io.containerd.grpc.v1
INFO[2023-12-19T07:15:19.971479650Z] loading plugin "io.containerd.transfer.v1.local"... type=io.containerd.transfer.v1
INFO[2023-12-19T07:15:19.971582048Z] loading plugin "io.containerd.internal.v1.restart"... type=io.containerd.internal.v1
INFO[2023-12-19T07:15:19.971726560Z] loading plugin "io.containerd.grpc.v1.containers"... type=io.containerd.grpc.v1
INFO[2023-12-19T07:15:19.971774011Z] loading plugin "io.containerd.grpc.v1.content"... type=io.containerd.grpc.v1
INFO[2023-12-19T07:15:19.971827905Z] loading plugin "io.containerd.grpc.v1.diff"... type=io.containerd.grpc.v1
INFO[2023-12-19T07:15:19.971875612Z] loading plugin "io.containerd.grpc.v1.events"... type=io.containerd.grpc.v1
INFO[2023-12-19T07:15:19.971903724Z] loading plugin "io.containerd.grpc.v1.healthcheck"... type=io.containerd.grpc.v1
INFO[2023-12-19T07:15:19.971956659Z] loading plugin "io.containerd.grpc.v1.images"... type=io.containerd.grpc.v1
INFO[2023-12-19T07:15:19.971989668Z] loading plugin "io.containerd.grpc.v1.leases"... type=io.containerd.grpc.v1
INFO[2023-12-19T07:15:19.972021510Z] loading plugin "io.containerd.grpc.v1.namespaces"... type=io.containerd.grpc.v1
INFO[2023-12-19T07:15:19.972062816Z] loading plugin "io.containerd.internal.v1.opt"... type=io.containerd.internal.v1
INFO[2023-12-19T07:15:19.972527160Z] loading plugin "io.containerd.grpc.v1.sandbox-controllers"... type=io.containerd.grpc.v1
INFO[2023-12-19T07:15:19.972602806Z] loading plugin "io.containerd.grpc.v1.sandboxes"... type=io.containerd.grpc.v1
INFO[2023-12-19T07:15:19.972641106Z] loading plugin "io.containerd.grpc.v1.snapshots"... type=io.containerd.grpc.v1
INFO[2023-12-19T07:15:19.972678763Z] loading plugin "io.containerd.grpc.v1.streaming"... type=io.containerd.grpc.v1
INFO[2023-12-19T07:15:19.972707116Z] loading plugin "io.containerd.grpc.v1.tasks"... type=io.containerd.grpc.v1
INFO[2023-12-19T07:15:19.972748311Z] loading plugin "io.containerd.grpc.v1.transfer"... type=io.containerd.grpc.v1
INFO[2023-12-19T07:15:19.972810763Z] loading plugin "io.containerd.grpc.v1.version"... type=io.containerd.grpc.v1
INFO[2023-12-19T07:15:19.972838043Z] loading plugin "io.containerd.tracing.processor.v1.otlp"... type=io.containerd.tracing.processor.v1
INFO[2023-12-19T07:15:19.972893752Z] skip loading plugin "io.containerd.tracing.processor.v1.otlp"... error="no OpenTelemetry endpoint: skip plugin" type=io.containerd.tracing.processor.v1
INFO[2023-12-19T07:15:19.972920891Z] loading plugin "io.containerd.internal.v1.tracing"... type=io.containerd.internal.v1
INFO[2023-12-19T07:15:19.973014894Z] skipping tracing processor initialization (no tracing plugin) error="no OpenTelemetry endpoint: skip plugin"
INFO[2023-12-19T07:15:19.973832690Z] serving... address=/var/run/docker/containerd/containerd-debug.sock
INFO[2023-12-19T07:15:19.974039245Z] serving... address=/var/run/docker/containerd/containerd.sock.ttrpc
INFO[2023-12-19T07:15:19.974159910Z] serving... address=/var/run/docker/containerd/containerd.sock
INFO[2023-12-19T07:15:19.974201365Z] containerd successfully booted in 0.049906s
INFO[2023-12-19T07:15:20.023289211Z] Loading containers: start.
INFO[2023-12-19T07:15:20.081978381Z] stopping event stream following graceful shutdown error="<nil>" module=libcontainerd namespace=moby
INFO[2023-12-19T07:15:20.083138320Z] stopping healthcheck following graceful shutdown module=libcontainerd
INFO[2023-12-19T07:15:20.083139546Z] stopping event stream following graceful shutdown error="context canceled" module=libcontainerd namespace=plugins.moby
failed to start daemon: Error initializing network controller: error obtaining controller instance: unable to add return rule in DOCKER-ISOLATION-STAGE-1 chain: (iptables failed: iptables --wait -A DOCKER-ISOLATION-STAGE-1 -j RETURN: iptables v1.8.10 (nf_tables): RULE_APPEND failed (No such file or directory): rule in chain DOCKER-ISOLATION-STAGE-1
(exit status 4))
The only way I could make my system use the legacy ip tables is to check for the warning as described in:
https://github.com/docker-library/docker/pull/468/files#r1430804593
See https://github.com/kautsig/docker/blob/better-iptables2/24/dind/dockerd-entrypoint.sh#L160
I don't know enough about this code to judge if it is even close to correct. What I see is that, the entrypoint always chooses nf_tables
for me, so i added this branch to force the legacy version if a warning is present.
For me, the suggestion by @tianon failes. My Gitlab dind services keep failing to resolve.
Build and run
worker10:~$ docker build --pull 'https://github.com/docker-library/docker.git#refs/pull/468/merge:24/dind' -t docker:dind-468
worker10:~$ docker run --privileged --network=test-xyz --rm --name=docker -e DOCKER_TLS_VERIFY=1 -e DOCKER_CERT_PATH=/certs/client -v dind-certs:/certs/client docker:dind-468
And exec into that container and run nslookup.
worker10:~$ docker exec -ti docker sh
/ # nslookup google.com
nslookup: write to '127.0.0.11': Connection refused
;; connection timed out; no servers could be reached
There is nothing in the above containers that suggest why it is failing. Contrary to @kautsig, I do not see errors related to iptables
.
However, with the container from @kautsig it succeeds.
Build and run
worker10:~$ docker build --pull 'https://github.com/kautsig/docker.git#better-iptables2:24/dind' -t docker:dind-469
worker10:~$ docker run --privileged --network=test-xyz --rm --name=docker -e DOCKER_TLS_VERIFY=1 -e DOCKER_CERT_PATH=/certs/client -v dind-certs:/certs/client docker:dind-469
And exec into that container and run nslookup.
worker10:~$ docker exec -ti docker sh
/ # nslookup google.com
Server: 127.0.0.11
Address: 127.0.0.11:53
Non-authoritative answer:
Name: google.com
Address: 216.58.206.46
Non-authoritative answer:
Name: google.com
Address: 2a00:1450:4001:81c::200e
The scripts from @tianon
worker10:~$ docker run -it --rm --privileged docker:dind sh -euxc 'modprobe nf_tables > /dev/null 2>&1 || :; if ! iptables -nL > /dev/null 2>&1; then modprobe ip_tables || :; /usr/local/sbin/.iptables-legacy/iptables -nL > /dev/null 2>&1; echo success legacy; else echo success nftables; fi'
+ modprobe nf_tables
+ :
+ iptables -nL
+ echo success nftables
success nftables
worker10:~$ docker run -it --rm --privileged docker:dind sh -euxc 'modprobe nf_tables > /dev/null 2>&1 || :; if ! false iptables -nL > /dev/null 2>&1; then modprobe ip_tables || :; /usr/local/sbin/.iptables-legacy/iptables -nL > /dev/null 2>&1; echo success legacy; else echo success nftables; fi'
+ modprobe nf_tables
+ :
+ false iptables -nL
+ modprobe ip_tables
ip: can't find device 'ip_tables'
ip_tables 28672 10 iptable_nat,iptable_filter
x_tables 40960 17 xt_recent,nft_compat,ipt_MASQUERADE,xt_nat,ip6t_REJECT,xt_hl,ip6t_rt,ipt_REJECT,xt_LOG,xt_limit,xt_tcpudp,xt_addrtype,xt_conntrack,ip6table_filter,ip6_tables,iptable_filter,ip_tables
modprobe: can't change directory to '/lib/modules': No such file or directory
+ :
+ /usr/local/sbin/.iptables-legacy/iptables -nL
+ echo success legacy
success legacy
@frederikbosch On which host distribution are you on?
We have the issue on our gitlab runners on COS 105 where the first fix #465 and current proposal #468 do not work.
On gitlab prod (COS 85) both fixes were confirmed to work: https://github.com/docker-library/docker/pull/468#issuecomment-1862504835
Old one, Ubuntu 18.04.
I faced same issue on gke Container-Optimized OS.
nf_tables is available and loaded into kernel
$ lsmod | grep nf_tables
nf_tables 245760 0
Run docker run --rm -ti --privileged --name docker -e DOCKER_TLS_CERTDIR= -p 2375:2375 docker:dind
and error as below
failed to start daemon: Error initializing network controller: error obtaining controller instance: unable to add return rule in DOCKER-ISOLATION-STAGE-1 chain: (iptables failed: iptables --wait -A DOCKER-ISOLATION-STAGE-1 -j RETURN: iptables v1.8.10 (nf_tables): RULE_APPEND failed (No such file or directory): rule in chain DOCKER-ISOLATION-STAGE-1 (exit status 4))
Workaround is either use image Ubuntu with containerd or specify docker image to 24.0.7-dind-alpine3.18
GKE version: 1.27.3-gke.100 Image Type: Container-Optimized OS with containerd (cos_containerd)
@kautsig Will https://github.com/docker-library/docker/pull/468#issuecomment-1862504835 solve this issue?
@saintnoah The comment you linked refers production gitlab saas runners. They use COS 85, for them the change done in the MR at that time didn't break it again (they need legacy ip tables).
For our issue, I am on COS 105, I'd say, this depends on how the discussion in the MR progresses... the entrypoint.sh
will continue to try to guess which iptables
to use, if it guesses wrong there's an env variable to override the guess. So worst case would be setting an env variable.
My current perspective on the wider issue is that the use of docker:latest
or docker:dind
is problematic in the first place, because of unexpected breakage. We instructed our folks to use version tags now (as we do everywhere else) and upgrade when they verified the new version to work.
@kautsig Agree, docker version should be specified to avoid similar upstream issues.
What I found out is with COS 105,
$ lsmod | grep nf_tables
nf_tables 245760 0
And for the ubuntu OS
$ lsmod | grep nf_tables
nf_tables 249856 191 nft_compat,nft_counter,nft_chain_nat
On COS 85:
stanhu-old-cos /home/stanhu # iptables -V
iptables v1.6.2
stanhu-old-cos /home/stanhu # docker run --privileged --rm -it docker:24.0.7-dind sh
/ # iptables -V
iptables v1.8.10 (nf_tables)
/ # iptables -nL
iptables v1.8.10 (nf_tables): Could not fetch rule set generation id: Invalid argument
/ # echo $?
4
On COS 105:
stanhu-test /home/stanhu # iptables -V
iptables v1.8.5 (legacy)
stanhu-test /home/stanhu # docker run --privileged --rm -it docker:24.0.7-dind sh
/ # iptables -V
iptables v1.8.10 (nf_tables)
/ # iptables -nL
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
/ # echo $?
0
Both the hosts of COS 85 and 105 use the legacy iptables
, but you can see that in the COS 85 case the iptables
inside the dind
container fails, which allows the fallback mechanism to run.
However, in COS 105, iptables
returns successfully, so dind
attempts to use that with nf_tables
. But that fails because the host added the DOCKER-ISOLATION-STAGE-1
rule via the legacy iptables
.
It seems that the auto-detection doesn't always work because the host iptables
flavor (legacy vs. nf_tables
) has to match the flavor used in the Docker image, but that information isn't readily available. The Docker host could run iptables -V
and if it sees legacy
, then export some environment variable that tells containers which mode was used.
Maybe for now we can just set a default environment variable, such as DOCKER_IPTABLES_FLAVOR=legacy
, to eliminate the guesswork. UPDATE: I see https://github.com/docker-library/docker/pull/468 does exactly this via DOCKER_IPTABLES_LEGACY=1
.
What do you think @tianon?
Just wanted to confirm the problem is solved, DOCKER_IPTABLES_LEGACY=1
does the trick now. Thanks @tianon!
For readers using the docker+machine gitlab runner (I think it came up in the discussion above), adding the environment variable to the [[runners]]
section worked for me, see gitlab documentation.
For us, the issue described in https://github.com/docker-library/docker/issues/463 persists, even after the fix. Our host image is a Google COS 105.
Initially I added my findings as a comment to #466, but after getting the logs, the problems looks different than the one described there.
What we observe is that the host uses a legacy ip tables version, but the
nf_tables
kernel module is present and loads fine. My guess is, that this makes the fix ineffective, because it usesmodprobe nf_tables
to determine which iptables version to use.Running the latest dind image on this system shows the original error.