docker / for-linux

Docker Engine for Linux
https://docs.docker.com/engine/installation/
754 stars 85 forks source link

failed to cleanup up stale network sandbox prevents docker from starting #688

Open alekc opened 5 years ago

alekc commented 5 years ago

Expected behavior

Being able to start docker

Actual behavior

After the last crash, docker is not starting with following log trace:

/usr/bin/dockerd -H unix:// -H tcp://0.0.0.0:2736
INFO[2019-06-04T16:45:50.232507849+01:00] libcontainerd: started new containerd process  pid=16507
INFO[2019-06-04T16:45:50.232653442+01:00] parsed scheme: "unix"                         module=grpc
INFO[2019-06-04T16:45:50.232667823+01:00] scheme "unix" not registered, fallback to default scheme  module=grpc
INFO[2019-06-04T16:45:50.232711962+01:00] ccResolverWrapper: sending new addresses to cc: [{unix:///var/run/docker/containerd/containerd.sock 0  <nil>}]  module=grpc
INFO[2019-06-04T16:45:50.232727864+01:00] ClientConn switching balancer to "pick_first"  module=grpc
INFO[2019-06-04T16:45:50.232783347+01:00] pickfirstBalancer: HandleSubConnStateChange: 0xc4207ab420, CONNECTING  module=grpc
INFO[2019-06-04T16:45:50.286608919+01:00] starting containerd                           revision=bb71b10fd8f58240ca47fbb579b9d1028eea7c84 version=1.2.5
INFO[2019-06-04T16:45:50.287069001+01:00] loading plugin "io.containerd.content.v1.content"...  type=io.containerd.content.v1
INFO[2019-06-04T16:45:50.287114285+01:00] loading plugin "io.containerd.snapshotter.v1.btrfs"...  type=io.containerd.snapshotter.v1
WARN[2019-06-04T16:45:50.287364497+01:00] failed to load plugin io.containerd.snapshotter.v1.btrfs  error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter"
INFO[2019-06-04T16:45:50.287392992+01:00] loading plugin "io.containerd.snapshotter.v1.aufs"...  type=io.containerd.snapshotter.v1
INFO[2019-06-04T16:45:50.288757645+01:00] loading plugin "io.containerd.snapshotter.v1.native"...  type=io.containerd.snapshotter.v1
INFO[2019-06-04T16:45:50.288808846+01:00] loading plugin "io.containerd.snapshotter.v1.overlayfs"...  type=io.containerd.snapshotter.v1
INFO[2019-06-04T16:45:50.288897883+01:00] loading plugin "io.containerd.snapshotter.v1.zfs"...  type=io.containerd.snapshotter.v1
WARN[2019-06-04T16:45:50.289230774+01:00] failed to load plugin io.containerd.snapshotter.v1.zfs  error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter"
INFO[2019-06-04T16:45:50.289251008+01:00] loading plugin "io.containerd.metadata.v1.bolt"...  type=io.containerd.metadata.v1
WARN[2019-06-04T16:45:50.289272196+01:00] could not use snapshotter zfs in metadata plugin  error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter"
WARN[2019-06-04T16:45:50.289284298+01:00] could not use snapshotter btrfs in metadata plugin  error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter"
INFO[2019-06-04T16:45:50.289455027+01:00] loading plugin "io.containerd.differ.v1.walking"...  type=io.containerd.differ.v1
INFO[2019-06-04T16:45:50.289481013+01:00] loading plugin "io.containerd.gc.v1.scheduler"...  type=io.containerd.gc.v1
INFO[2019-06-04T16:45:50.289520689+01:00] loading plugin "io.containerd.service.v1.containers-service"...  type=io.containerd.service.v1
INFO[2019-06-04T16:45:50.289539530+01:00] loading plugin "io.containerd.service.v1.content-service"...  type=io.containerd.service.v1
INFO[2019-06-04T16:45:50.289556698+01:00] loading plugin "io.containerd.service.v1.diff-service"...  type=io.containerd.service.v1
INFO[2019-06-04T16:45:50.289574316+01:00] loading plugin "io.containerd.service.v1.images-service"...  type=io.containerd.service.v1
INFO[2019-06-04T16:45:50.289591954+01:00] loading plugin "io.containerd.service.v1.leases-service"...  type=io.containerd.service.v1
INFO[2019-06-04T16:45:50.289609605+01:00] loading plugin "io.containerd.service.v1.namespaces-service"...  type=io.containerd.service.v1
INFO[2019-06-04T16:45:50.289628968+01:00] loading plugin "io.containerd.service.v1.snapshots-service"...  type=io.containerd.service.v1
INFO[2019-06-04T16:45:50.289646549+01:00] loading plugin "io.containerd.runtime.v1.linux"...  type=io.containerd.runtime.v1
INFO[2019-06-04T16:45:50.294036946+01:00] loading plugin "io.containerd.runtime.v2.task"...  type=io.containerd.runtime.v2
INFO[2019-06-04T16:45:50.294101159+01:00] loading plugin "io.containerd.monitor.v1.cgroups"...  type=io.containerd.monitor.v1
INFO[2019-06-04T16:45:50.294555800+01:00] loading plugin "io.containerd.service.v1.tasks-service"...  type=io.containerd.service.v1
INFO[2019-06-04T16:45:50.294739318+01:00] loading plugin "io.containerd.internal.v1.restart"...  type=io.containerd.internal.v1
INFO[2019-06-04T16:45:50.294791998+01:00] loading plugin "io.containerd.grpc.v1.containers"...  type=io.containerd.grpc.v1
INFO[2019-06-04T16:45:50.294812408+01:00] loading plugin "io.containerd.grpc.v1.content"...  type=io.containerd.grpc.v1
INFO[2019-06-04T16:45:50.294834648+01:00] loading plugin "io.containerd.grpc.v1.diff"...  type=io.containerd.grpc.v1
INFO[2019-06-04T16:45:50.294852873+01:00] loading plugin "io.containerd.grpc.v1.events"...  type=io.containerd.grpc.v1
INFO[2019-06-04T16:45:50.294870387+01:00] loading plugin "io.containerd.grpc.v1.healthcheck"...  type=io.containerd.grpc.v1
INFO[2019-06-04T16:45:50.294887311+01:00] loading plugin "io.containerd.grpc.v1.images"...  type=io.containerd.grpc.v1
INFO[2019-06-04T16:45:50.294904640+01:00] loading plugin "io.containerd.grpc.v1.leases"...  type=io.containerd.grpc.v1
INFO[2019-06-04T16:45:50.294922559+01:00] loading plugin "io.containerd.grpc.v1.namespaces"...  type=io.containerd.grpc.v1
INFO[2019-06-04T16:45:50.294939886+01:00] loading plugin "io.containerd.internal.v1.opt"...  type=io.containerd.internal.v1
INFO[2019-06-04T16:45:50.295002382+01:00] loading plugin "io.containerd.grpc.v1.snapshots"...  type=io.containerd.grpc.v1
INFO[2019-06-04T16:45:50.295027592+01:00] loading plugin "io.containerd.grpc.v1.tasks"...  type=io.containerd.grpc.v1
INFO[2019-06-04T16:45:50.295058889+01:00] loading plugin "io.containerd.grpc.v1.version"...  type=io.containerd.grpc.v1
INFO[2019-06-04T16:45:50.295076774+01:00] loading plugin "io.containerd.grpc.v1.introspection"...  type=io.containerd.grpc.v1
INFO[2019-06-04T16:45:50.295252804+01:00] serving...                                    address="/var/run/docker/containerd/containerd-debug.sock"
INFO[2019-06-04T16:45:50.295326295+01:00] serving...                                    address="/var/run/docker/containerd/containerd.sock"
INFO[2019-06-04T16:45:50.295347678+01:00] containerd successfully booted in 0.009455s
INFO[2019-06-04T16:45:50.304161005+01:00] pickfirstBalancer: HandleSubConnStateChange: 0xc4207ab420, READY  module=grpc
INFO[2019-06-04T16:45:50.315167217+01:00] parsed scheme: "unix"                         module=grpc
INFO[2019-06-04T16:45:50.315204663+01:00] scheme "unix" not registered, fallback to default scheme  module=grpc
INFO[2019-06-04T16:45:50.315253648+01:00] parsed scheme: "unix"                         module=grpc
INFO[2019-06-04T16:45:50.315266747+01:00] scheme "unix" not registered, fallback to default scheme  module=grpc
INFO[2019-06-04T16:45:50.315279939+01:00] ccResolverWrapper: sending new addresses to cc: [{unix:///var/run/docker/containerd/containerd.sock 0  <nil>}]  module=grpc
INFO[2019-06-04T16:45:50.315312407+01:00] ClientConn switching balancer to "pick_first"  module=grpc
INFO[2019-06-04T16:45:50.315366498+01:00] pickfirstBalancer: HandleSubConnStateChange: 0xc420830160, CONNECTING  module=grpc
INFO[2019-06-04T16:45:50.315384121+01:00] ccResolverWrapper: sending new addresses to cc: [{unix:///var/run/docker/containerd/containerd.sock 0  <nil>}]  module=grpc
INFO[2019-06-04T16:45:50.315417096+01:00] ClientConn switching balancer to "pick_first"  module=grpc
INFO[2019-06-04T16:45:50.315469736+01:00] pickfirstBalancer: HandleSubConnStateChange: 0xc4201d0560, CONNECTING  module=grpc
INFO[2019-06-04T16:45:50.315504391+01:00] blockingPicker: the picked transport is not ready, loop back to repick  module=grpc
INFO[2019-06-04T16:45:50.315536074+01:00] pickfirstBalancer: HandleSubConnStateChange: 0xc420830160, READY  module=grpc
INFO[2019-06-04T16:45:50.315618327+01:00] pickfirstBalancer: HandleSubConnStateChange: 0xc4201d0560, READY  module=grpc
INFO[2019-06-04T16:45:50.319463546+01:00] [graphdriver] using prior storage driver: aufs
INFO[2019-06-04T16:45:50.535041248+01:00] Graph migration to content-addressability took 0.00 seconds
WARN[2019-06-04T16:45:50.535391920+01:00] Your kernel does not support swap memory limit
WARN[2019-06-04T16:45:50.535475380+01:00] Your kernel does not support cgroup rt period
WARN[2019-06-04T16:45:50.535489993+01:00] Your kernel does not support cgroup rt runtime
INFO[2019-06-04T16:45:50.535974464+01:00] Loading containers: start.
ERRO[2019-06-04T16:45:50.791566008+01:00] dbe4713ce178cc12456da37ca367fc995df93b80747ba467a512caa6fb27df75 cleanup: failed to delete container from containerd: no such container
ERRO[2019-06-04T16:45:50.791587387+01:00] 088ad00cb86b4fbb5b7bd2bec6e33c4c96a6dd69da830f535ff8d5ffc4a363e4 cleanup: failed to delete container from containerd: no such container
ERRO[2019-06-04T16:45:50.791604544+01:00] 7e29954874035fbddde25c2eef3b232c8014d99b8e83c871f0f4d063624c7b8b cleanup: failed to delete container from containerd: no such container
INFO[2019-06-04T16:45:51.083473849+01:00] Removing stale sandbox 234c1f11b6bdbca272c2453975c0c315dfac3fc0b65254fa6d43f6e63bf852d7 (6bf5014d184d934f9404cddfeb074da217fc2c6de120ae6d42c1673732584b1f)
INFO[2019-06-04T16:45:51.490158193+01:00] Removing stale sandbox 31bb6ba084d5c464dcfd35acb823eb88b4294e59d88c971a75715754156574de (61def67861bdcac9fb4168ab3a4319dec1b340e9e8d2aa97dc9ec4189849dd11)
INFO[2019-06-04T16:45:51.887412334+01:00] Removing stale sandbox b6e8fef91605bd9e6e2067b2e44e7b95d8a2ecee647c7b94f497c8caf9d533ce (4f33f7fbd1bd82d1188f7f17a0d12bd1d3698f52519b0b4dc4895a79b33caa71)
ERRO[2019-06-04T16:45:52.288044745+01:00] getNetworkFromStore for nid b088818e6c42af52826d24bce2c0aa59f27fcd0f670dd017df484d16597288d9 failed while trying to build sandbox for cleanup: network b088818e6c42af52826d24bce2c0aa59f27fcd0f670dd017df484d16597288d9 not found
ERRO[2019-06-04T16:45:52.288092772+01:00] getNetworkFromStore for nid 6110c7cc40d49b2ddda6811a5dce1a7431e5e1888a2aec1198dba85c1001a7f7 failed while trying to build sandbox for cleanup: network 6110c7cc40d49b2ddda6811a5dce1a7431e5e1888a2aec1198dba85c1001a7f7 not found
INFO[2019-06-04T16:45:52.288106673+01:00] Removing stale sandbox 0165346f6e3d234663fa8468be20c97194f9b058ffc5a3996949a58aaa533d72 (1ea4dcc7807d16d5fdc8c68cd1367e27a1ea8b3a0b7a6b90fe039076cc886549)
WARN[2019-06-04T16:45:52.439668478+01:00] Error (Unable to complete atomic operation, key modified) deleting object [endpoint 55fcc4b0fd7f83c38e86e9ec2c3463c478201bbcb361d7772d7b586fd77bfe9c cc817069301908de349ca221f60325b991681261b060b16b6137cac5a91ed4c5], retrying....
WARN[2019-06-04T16:45:52.440619842+01:00] Failed getting network for ep 0f72e82d96da4b17fd8787b912577c9a2945506a38ae7a48d6be6245880926af during sandbox 0165346f6e3d234663fa8468be20c97194f9b058ffc5a3996949a58aaa533d72 delete: network b088818e6c42af52826d24bce2c0aa59f27fcd0f670dd017df484d16597288d9 not found
WARN[2019-06-04T16:45:52.440644470+01:00] Failed getting network for ep 9cf6170074b93b14754584a8125fac347893d329073dcc8348694ecb33d8d7a2 during sandbox 0165346f6e3d234663fa8468be20c97194f9b058ffc5a3996949a58aaa533d72 delete: network 6110c7cc40d49b2ddda6811a5dce1a7431e5e1888a2aec1198dba85c1001a7f7 not found
ERRO[2019-06-04T16:45:52.440657603+01:00] Failed to delete sandbox 0165346f6e3d234663fa8468be20c97194f9b058ffc5a3996949a58aaa533d72 while trying to cleanup: could not cleanup all the endpoints in container 1ea4dcc7807d16d5fdc8c68cd1367e27a1ea8b3a0b7a6b90fe039076cc886549 / sandbox 0165346f6e3d234663fa8468be20c97194f9b058ffc5a3996949a58aaa533d72
INFO[2019-06-04T16:45:52.697629130+01:00] Removing stale sandbox 0803e51328c2da5821aea57e3afe87929ff4bbd540fad07c97d2cb3940f132df (2fe36bdf9db8c7b864ace5478565e76ef8f2032e806636f75a9cda388ce49ecd)
INFO[2019-06-04T16:45:52.739723543+01:00] Fixing inconsistent endpoint_cnt for network stash_public. Expected=0, Actual=1
INFO[2019-06-04T16:45:52.740540702+01:00] Removing stale endpoint registry (feb4bb5e52032354ed1421e16666232a5ef65539b4502800db6de477d5f11a13)
INFO[2019-06-04T16:45:52.836813967+01:00] Fixing inconsistent endpoint_cnt for network bridge. Expected=0, Actual=3
INFO[2019-06-04T16:45:52.993911891+01:00] Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address
WARN[2019-06-04T16:45:53.103046046+01:00] Failed getting network for ep 0f72e82d96da4b17fd8787b912577c9a2945506a38ae7a48d6be6245880926af during sandbox 0165346f6e3d234663fa8468be20c97194f9b058ffc5a3996949a58aaa533d72 delete: network b088818e6c42af52826d24bce2c0aa59f27fcd0f670dd017df484d16597288d9 not found
WARN[2019-06-04T16:45:53.103081307+01:00] Failed getting network for ep 9cf6170074b93b14754584a8125fac347893d329073dcc8348694ecb33d8d7a2 during sandbox 0165346f6e3d234663fa8468be20c97194f9b058ffc5a3996949a58aaa533d72 delete: network 6110c7cc40d49b2ddda6811a5dce1a7431e5e1888a2aec1198dba85c1001a7f7 not found
ERRO[2019-06-04T16:45:53.103091308+01:00] failed to cleanup up stale network sandbox for container 1ea4dcc7807d16d5fdc8c68cd1367e27a1ea8b3a0b7a6b90fe039076cc886549
WARN[2019-06-04T16:45:53.164641633+01:00] Failed to allocate and map port 5000-5000: Bind for 0.0.0.0:5000 failed: port is already allocated
ERRO[2019-06-04T16:45:53.346585529+01:00] stream copy error: reading from a closed fifo
ERRO[2019-06-04T16:45:53.346597488+01:00] stream copy error: reading from a closed fifo
ERRO[2019-06-04T16:45:53.442624229+01:00] 2fe36bdf9db8c7b864ace5478565e76ef8f2032e806636f75a9cda388ce49ecd cleanup: failed to delete container from containerd: no such container
ERRO[2019-06-04T16:45:53.442683027+01:00] Failed to start container 2fe36bdf9db8c7b864ace5478565e76ef8f2032e806636f75a9cda388ce49ecd: driver failed programming external connectivity on endpoint registry (0482e2f4ea499a603579ec7c8bf9f25d3e3ed372de5c528e51d30452579ac04e): Bind for 0.0.0.0:5000 failed: port is already allocated
ERRO[2019-06-04T16:45:53.499972925+01:00] stream copy error: reading from a closed fifo
ERRO[2019-06-04T16:45:53.500056468+01:00] stream copy error: reading from a closed fifo
INFO[2019-06-04T16:45:53.501796555+01:00] shim containerd-shim started                  address="/containerd-shim/moby/7e29954874035fbddde25c2eef3b232c8014d99b8e83c871f0f4d063624c7b8b/shim.sock" debug=false pid=16912
INFO[2019-06-04T16:45:53.540744901+01:00] shim containerd-shim started                  address="/containerd-shim/moby/dbe4713ce178cc12456da37ca367fc995df93b80747ba467a512caa6fb27df75/shim.sock" debug=false pid=16930
ERRO[2019-06-04T16:45:53.667721317+01:00] stream copy error: reading from a closed fifo
ERRO[2019-06-04T16:45:53.667752986+01:00] stream copy error: reading from a closed fifo
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x55a134739c45]

goroutine 300 [running]:
github.com/docker/docker/vendor/github.com/docker/libnetwork.(*endpoint).addServiceInfoToCluster(0xc421177600, 0xc420e54480, 0x0, 0x0)
        /go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/agent.go:589 +0xa5
github.com/docker/docker/vendor/github.com/docker/libnetwork.(*sandbox).EnableService(0xc420e54480, 0x0, 0x0)
        /go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/sandbox.go:702 +0x16f
github.com/docker/docker/daemon.(*Daemon).ActivateContainerServiceBinding(0xc420b9c000, 0xc420b02318, 0x8, 0x0, 0x0)
        /go/src/github.com/docker/docker/daemon/container_operations.go:1110 +0x89
github.com/docker/docker/daemon.(*Daemon).connectToNetwork(0xc420b9c000, 0xc420127d40, 0x55a134f1ddb9, 0x6, 0xc420eae240, 0x0, 0x0, 0x0)
        /go/src/github.com/docker/docker/daemon/container_operations.go:788 +0x9db
github.com/docker/docker/daemon.(*Daemon).allocateNetwork(0xc420b9c000, 0xc420127d40, 0x0, 0x0)
        /go/src/github.com/docker/docker/daemon/container_operations.go:532 +0xa31
github.com/docker/docker/daemon.(*Daemon).initializeNetworking(0xc420b9c000, 0xc420127d40, 0x0, 0x0)
        /go/src/github.com/docker/docker/daemon/container_operations.go:945 +0xa2
github.com/docker/docker/daemon.(*Daemon).containerStart(0xc420b9c000, 0xc420127d40, 0x0, 0x0, 0x0, 0x0, 0x1, 0x0, 0x0)
        /go/src/github.com/docker/docker/daemon/start.go:150 +0x2c2
github.com/docker/docker/daemon.(*Daemon).restore.func2(0xc421213900, 0xc420b9c000, 0xc420d14060, 0xc420127d40, 0xc4212381e0)
        /go/src/github.com/docker/docker/daemon/daemon.go:455 +0x2d4
created by github.com/docker/docker/daemon.(*Daemon).restore
        /go/src/github.com/docker/docker/daemon/daemon.go:435 +0x1025

Steps to reproduce the behavior

Not sure how to reproduce it. But currently docker is down, and there are multiple docker's network interfaces present. If I remove them manually (at least br-xxx ones), then they are recreated when I try to run docker again.

Output of docker version:

/usr/bin/dockerd -v
Docker version 18.09.6, build 481bc77

Output of docker info:

na
ifnotak commented 4 years ago

Similar issue, providing the logs for comparison:

systemd[1]: Starting Docker Application Container Engine...
dockerd[7932]: time="2020-04-03T01:17:00.994257494+02:00" level=info msg="Starting up"
dockerd[7932]: time="2020-04-03T01:17:00.995677901+02:00" level=info msg="parsed scheme: \"unix\"" module=grpc
dockerd[7932]: time="2020-04-03T01:17:00.996081387+02:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
dockerd[7932]: time="2020-04-03T01:17:00.996428662+02:00" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock 0  <nil>}] <nil>}" module=grpc
dockerd[7932]: time="2020-04-03T01:17:00.996720686+02:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
dockerd[7932]: time="2020-04-03T01:17:00.998190806+02:00" level=info msg="parsed scheme: \"unix\"" module=grpc
dockerd[7932]: time="2020-04-03T01:17:00.998210353+02:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
dockerd[7932]: time="2020-04-03T01:17:00.998228833+02:00" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock 0  <nil>}] <nil>}" module=grpc
dockerd[7932]: time="2020-04-03T01:17:00.998238866+02:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
dockerd[7932]: time="2020-04-03T01:17:01.013037586+02:00" level=info msg="[graphdriver] using prior storage driver: overlay2"
dockerd[7932]: time="2020-04-03T01:17:01.035758061+02:00" level=warning msg="Your kernel does not support swap memory limit"
dockerd[7932]: time="2020-04-03T01:17:01.035789780+02:00" level=warning msg="Your kernel does not support cgroup rt period"
dockerd[7932]: time="2020-04-03T01:17:01.035796358+02:00" level=warning msg="Your kernel does not support cgroup rt runtime"
dockerd[7932]: time="2020-04-03T01:17:01.035932766+02:00" level=info msg="Loading containers: start."
dockerd[7932]: time="2020-04-03T01:17:01.548248367+02:00" level=error msg="getNetworkFromStore for nid 446e39b990727a489a27bb95c699551bdbb99c1e0b4e13d3f9f2f34e2ef85ae3 failed while trying to build sandbox for cleanup: network 446e39b990727a489a27bb95c699551bdbb99c1e0b4e13d3f9f2f34e2ef85ae3 not found"
dockerd[7932]: time="2020-04-03T01:17:01.548291835+02:00" level=info msg="Removing stale sandbox 08ee73a0483cb9cdc3d4fbec3cd759554f3e1865d2baa7aafc94fe2a315c45b5 (1d3637daa9d9dd7d596e4db50a32a02a21c42d4810753055785e36538008ac5a)"
dockerd[7932]: time="2020-04-03T01:17:01.567197747+02:00" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint 2c59d4728bbae09c6f816e5c3d6a89b32c841ef6bdf98f4af99091df28d638d2 8283e268f371afe93ca0dafb5653231f26fbafb250e360cb90b7b20b9d909a05], retrying...."
dockerd[7932]: time="2020-04-03T01:17:01.569913193+02:00" level=warning msg="Failed getting network for ep 6fb6efbdfccb4953878a23713ffb78cb5b4c09dbb39df376d6e2299ae8d06030 during sandbox 08ee73a0483cb9cdc3d4fbec3cd759554f3e1865d2baa7aafc94fe2a315c45b5 delete: network 446e39b990727a489a27bb95c699551bdbb99c1e0b4e13d3f9f2f34e2ef85ae3 not found"
dockerd[7932]: time="2020-04-03T01:17:01.569943216+02:00" level=error msg="Failed to delete sandbox 08ee73a0483cb9cdc3d4fbec3cd759554f3e1865d2baa7aafc94fe2a315c45b5 while trying to cleanup: could not cleanup all the endpoints in container 1d3637daa9d9dd7d596e4db50a32a02a21c42d4810753055785e36538008ac5a / sandbox 08ee73a0483cb9cdc3d4fbec3cd759554f3e1865d2baa7aafc94fe2a315c45b5"
dockerd[7932]: time="2020-04-03T01:17:01.663561528+02:00" level=info msg="Removing stale sandbox 9ddaa33a8223ce77c5c346a75c01e3adfa7b2d6ada339c67efed732a93f30f71 (5227e8bc97c0fd56360d5577de24b3ae724c2e63364f3d8ab16c650c782afef5)"
dockerd[7932]: time="2020-04-03T01:17:01.681488389+02:00" level=info msg="Removing stale endpoint website_db_1 (0f827b20d25afcf3afb68bdf7d2f8b10dc1a3d4219e5f621f53cdf5b290c17be)"
dockerd[7932]: time="2020-04-03T01:17:01.730543201+02:00" level=info msg="Fixing inconsistent endpoint_cnt for network website_default. Expected=0, Actual=1"
dockerd[7932]: time="2020-04-03T01:17:01.758639580+02:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
dockerd[7932]: time="2020-04-03T01:17:01.800804484+02:00" level=warning msg="Failed getting network for ep 6fb6efbdfccb4953878a23713ffb78cb5b4c09dbb39df376d6e2299ae8d06030 during sandbox 08ee73a0483cb9cdc3d4fbec3cd759554f3e1865d2baa7aafc94fe2a315c45b5 delete: network 446e39b990727a489a27bb95c699551bdbb99c1e0b4e13d3f9f2f34e2ef85ae3 not found"
dockerd[7932]: time="2020-04-03T01:17:01.801431437+02:00" level=error msg="failed to cleanup up stale network sandbox for container 1d3637daa9d9dd7d596e4db50a32a02a21c42d4810753055785e36538008ac5a"
dockerd[7932]: time="2020-04-03T01:17:01.902118731+02:00" level=error msg="stream copy error: reading from a closed fifo"
dockerd[7932]: time="2020-04-03T01:17:01.902203046+02:00" level=error msg="stream copy error: reading from a closed fifo"
dockerd[7932]: panic: runtime error: invalid memory address or nil pointer dereference
dockerd[7932]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x5647a626be94]
dockerd[7932]: goroutine 391 [running]:
dockerd[7932]: github.com/docker/docker/vendor/github.com/docker/libnetwork.(*endpoint).addServiceInfoToCluster(0xc000c87080, 0xc000c966c0, 0x0, 0x0)
dockerd[7932]:         /go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/agent.go:599 +0xf74
dockerd[7932]: github.com/docker/docker/vendor/github.com/docker/libnetwork.(*sandbox).EnableService(0xc000c966c0, 0x0, 0x0)
dockerd[7932]:         /go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/sandbox.go:702 +0x187
dockerd[7932]: github.com/docker/docker/daemon.(*Daemon).ActivateContainerServiceBinding(0xc00000c3c0, 0xc000aac506, 0x7, 0x0, 0x0)
dockerd[7932]:         /go/src/github.com/docker/docker/daemon/container_operations.go:1118 +0x16a
dockerd[7932]: github.com/docker/docker/daemon.(*Daemon).connectToNetwork(0xc00000c3c0, 0xc00077e6c0, 0x5647a6d6e930, 0x6, 0xc000a7c0c0, 0x0, 0x0, 0x0)
dockerd[7932]:         /go/src/github.com/docker/docker/daemon/container_operations.go:800 +0xa13
dockerd[7932]: github.com/docker/docker/daemon.(*Daemon).allocateNetwork(0xc00000c3c0, 0xc00077e6c0, 0x0, 0x0)
dockerd[7932]:         /go/src/github.com/docker/docker/daemon/container_operations.go:543 +0x9b2
dockerd[7932]: github.com/docker/docker/daemon.(*Daemon).initializeNetworking(0xc00000c3c0, 0xc00077e6c0, 0x0, 0x0)
dockerd[7932]:         /go/src/github.com/docker/docker/daemon/container_operations.go:957 +0x9a
dockerd[7932]: github.com/docker/docker/daemon.(*Daemon).containerStart(0xc00000c3c0, 0xc00077e6c0, 0x0, 0x0, 0x0, 0x0, 0x1, 0x0, 0x0)
dockerd[7932]:         /go/src/github.com/docker/docker/daemon/start.go:150 +0x310
dockerd[7932]: github.com/docker/docker/daemon.(*Daemon).restore.func5(0xc0000e02d0, 0xc00000c3c0, 0xc000405500, 0xc000560ea0, 0xc00077e6c0, 0xc000c94720)
dockerd[7932]:         /go/src/github.com/docker/docker/daemon/daemon.go:501 +0x2c9
dockerd[7932]: created by github.com/docker/docker/daemon.(*Daemon).restore
dockerd[7932]:         /go/src/github.com/docker/docker/daemon/daemon.go:482 +0x744
systemd[1]: docker.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
systemd[1]: Failed to start Docker Application Container Engine.
systemd[1]: docker.service: Unit entered failed state.
systemd[1]: docker.service: Failed with result 'exit-code'.
systemd[1]: docker.service: Service hold-off time over, scheduling restart.
systemd[1]: Stopped Docker Application Container Engine.
systemd[1]: docker.service: Start request repeated too quickly.
systemd[1]: Failed to start Docker Application Container Engine.
systemd[1]: docker.service: Unit entered failed state.
systemd[1]: docker.service: Failed with result 'exit-code'.

dockerd -v: Docker version 19.03.8, build afacb8b7f0

I think I have to nuke /var/lib/docker. I hope I have backed up everything!!

harridu commented 3 years ago

metoo

ateivjain commented 2 years ago

I'm seeing similar issue on Docker version 18.09.8, build 0dd43dd

bash:~$ sudo journalctl -u docker -l | wc -l 43912



+ We had to kill docker process and wait for it remove those stale network interfaces.
diplompils commented 1 year ago

Hi all, I face an issue which looks quite similar. Do you have any idea on a workaround?

For Reference:

[root@x.x ~]# journalctl -u docker -n 50 -l -- Logs begin at Tue 2023-01-17 13:33:24 CET, end at Mon 2023-02-13 14:17:39 CET. -- Feb 13 14:17:33 x.x dockerd[9596]: time="2023-02-13T14:17:33.527567369+01:00" level=info msg="Removing stale sandbox f48f6e18ac8f01e9c305c796a9252894ff9b8f0af950b3a0a70e0207ceef5c70 (fc6fcd37cf54d087c37627b5bac4a713884bfbc1481322da27ff28a5c34c19ca)" Feb 13 14:17:33 x.x dockerd[9596]: time="2023-02-13T14:17:33.668269586+01:00" level=info msg="Removing stale sandbox fdef052aa4d0f3f997dd43b67a1f30d28942b7b450d1ad17736916dfdbdcd35b (119932af4811785dde69e818a40e901dde24ff358967fde2cd2decdb7294e3f8)" Feb 13 14:17:33 x.x dockerd[9596]: time="2023-02-13T14:17:33.803477271+01:00" level=info msg="Removing stale sandbox ff55f7c7292690c6fe415125c50fb320e93c17210feb0d869500e339e155507a (8dedc5092b3a8b2d579ba9438466b7032ef54a07662ff715f35ff42818093c6b)" Feb 13 14:17:33 x.x dockerd[9596]: time="2023-02-13T14:17:33.930769176+01:00" level=info msg="Removing stale sandbox 103e19ff12b0f415b61604ebe110b00cf325d13ceba1518c5f2dbcddc75cdeba (c21ae3c59444ae3f6f85a50fd24ab327cdae3113e56d0c589c34737a0f3b67ee)" Feb 13 14:17:36 x.x dockerd[9596]: time="2023-02-13T14:17:36.494434812+01:00" level=info msg="Removing stale sandbox d7a073f06a0acd22637af71e6a8ccda228074cced1c0a0481b999a093d2451ea (251cf7c0ab98f5896e4c980f7f2c91db341eacf7c164c5914f48dfaaa8156e43)" Feb 13 14:17:36 x.x dockerd[9596]: time="2023-02-13T14:17:36.623828201+01:00" level=info msg="Removing stale sandbox 1f215bd9999ca4067c1b0346caae68d5a24a47c8efb9a99e9f513ad35f15d999 (6ab73d128c4e17de862e8d328a6cd40ed1595378b5a2667e094140767b1c644e)" Feb 13 14:17:36 x.x dockerd[9596]: time="2023-02-13T14:17:36.764782584+01:00" level=info msg="Removing stale sandbox 24acb278e43dbdc49b92ca723478c41bc8287ddb5a9fdfad9cb9872681e60d47 (f78e6032cd8b4f25c72b036e70c56351432258a3eb71f8aa0b7ddbe957f405cf)" Feb 13 14:17:36 x.x dockerd[9596]: time="2023-02-13T14:17:36.776151783+01:00" level=error msg="failed to create osl sandbox while trying to restore sandbox de579cb: failed get network namespace \"/var/run/docker/netns/de579cbef9f8\": no such file or directory" Feb 13 14:17:36 x.x dockerd[9596]: time="2023-02-13T14:17:36.900964893+01:00" level=info msg="Removing stale sandbox 78510cc0d316eb4fd1fb1c135c38304b25d3f4545705e16c5dc8ffd9ef9bc9a4 (8c420c5891e5eb47603eeeaf7aba706379bb4375bf5b52bc6c38730d9c93ba2b)"

[root@x.x ~]# journalctl -u docker -l | wc -l 705571

diplompils commented 1 year ago

my "workaround" around the stale sandbox is to remove the ../docker/network/files/local-kv.db periodically via cronjob. Not nice, not a solution but it keeps the sandbox removal time low.