Intermittently not accepting connections in docker swarm

docker / for-linux

Docker Engine for Linux

753 stars 85 forks source link

[x] This is a bug report
[ ] This is a feature request
[x] I searched existing issues before opening this one

Expected behaviour

That services in the swarm will always be able to accept routed requests and connections.

Actual behaviour

Intermittently services on nodes within the swam will not receive quests and will timeout

Steps to reproduce the behaviour

As per:

Without any logged errors a service on a node will just stop taking routed internal requests and will have to be drained, though restoring the service to that node doesn't help so it's an ever decreasing pool of resources that is avaiable in the docker swarm.

Output of docker version:

Docker version 19.03.8, build afacb8b7f0

Output of docker info:

Client:
 Debug Mode: false

Server:
 Containers: 51
  Running: 31
  Paused: 0
  Stopped: 20
 Images: 362
 Server Version: 19.03.8
 Storage Driver: overlay2
  Backing Filesystem: <unknown>
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: active
  NodeID: 3y7l7c5dl33wmcabojcn470s3
  Is Manager: true
  ClusterID: r4gu75a9dus7zzxvwpdh1zjll
  Managers: 5
  Nodes: 6
  Default Address Pool: 10.0.0.0/8  
  SubnetSize: 24
  Data Path Port: 4789
  Orchestration:
   Task History Retention Limit: 5
  Raft:
   Snapshot Interval: 10000
   Number of Old Snapshots to Retain: 0
   Heartbeat Tick: 1
   Election Tick: 10
  Dispatcher:
   Heartbeat Period: 5 seconds
  CA Configuration:
   Expiry Duration: 3 months
   Force Rotate: 0
  Autolock Managers: false
  Root Rotation In Progress: false
  Node Address: 10.0.3.11
  Manager Addresses:
   10.0.1.11:2377
   10.0.1.12:2377
   10.0.2.11:2377
   10.0.3.11:2377
   10.0.3.12:2377
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 8fba4e9a7d01810a393d5d25a3621dc101981175
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.4.0-29-generic

Additional environment details (AWS, VirtualBox, physical, etc.)

Physical machines running Ubuntu 20.04.1 LTS

$ docker version Client: Version: 20.10.5+dfsg1 API version: 1.41 Go version: go1.15.9 Git commit: 55c4c88 Built: Wed Aug 4 19:55:57 2021 OS/Arch: linux/amd64 Context: default Experimental: true Server: Engine: Version: 20.10.5+dfsg1 API version: 1.41 (minimum version 1.12) Go version: go1.15.9 Git commit: 363e9a8 Built: Wed Aug 4 19:55:57 2021 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.4.5~ds1 GitCommit: 1.4.5~ds1-2+deb11u1 runc: Version: 1.0.0~rc93+ds1 GitCommit: 1.0.0~rc93+ds1-5+b2 docker-init: Version: 0.19.0 GitCommit:

$ docker info Client: Context: default Debug Mode: false Server: Containers: 8 Running: 7 Paused: 0 Stopped: 1 Images: 24 Server Version: 20.10.5+dfsg1 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true Logging Driver: json-file Cgroup Driver: systemd Cgroup Version: 2 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: active NodeID: ifg4shc3fhlegb83nk6gjtoc5 Is Manager: true ClusterID: pskiot9vwjp10zazx0jumybmr Managers: 3 Nodes: 3 Default Address Pool: 10.0.0.0/8 SubnetSize: 24 Data Path Port: 4789 Orchestration: Task History Retention Limit: 5 Raft: Snapshot Interval: 10000 Number of Old Snapshots to Retain: 0 Heartbeat Tick: 1 Election Tick: 10 Dispatcher: Heartbeat Period: 5 seconds CA Configuration: Expiry Duration: 3 months Force Rotate: 0 Autolock Managers: false Root Rotation In Progress: false Node Address: 192.168.34.46 Manager Addresses: 192.168.34.46:2377 192.168.34.47:2377 192.168.34.48:2377 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc Default Runtime: runc Init Binary: docker-init containerd version: 1.4.5~ds1-2+deb11u1 runc version: 1.0.0~rc93+ds1-5+b2 init version: Security Options: apparmor seccomp Profile: default cgroupns Kernel Version: 5.10.0-9-amd64 Operating System: Debian GNU/Linux 11 (bullseye) OSType: linux Architecture: x86_64 CPUs: 2 Total Memory: 2.849GiB Name: mediabox ID: AFL7:KR2U:SOTJ:YWLM:MI5G:Z4L6:2IEF:GSLX:C2QN:BMSU:MNYI:56RR Docker Root Dir: /var/lib/docker Debug Mode: false Username: jdeluyck Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false

docker / for-linux