docker-archive / classicswarm

Swarm Classic: a container clustering system. Not to be confused with Docker Swarm which is at https://github.com/docker/swarmkit
Apache License 2.0
5.75k stars 1.08k forks source link

Swarm manager can't get task ports, but actual task on worker has ports. #2861

Closed BaliStarDUT closed 6 years ago

BaliStarDUT commented 6 years ago

My swarm cluster has 1 manager and 2 worker. Running 100+ services on the cluster , and expose different ports outside. it works well last 2 weeks.

When run docker service ps name It should be:

$ docker service ps ac_5820_0_V1
ID                  NAME                         IMAGE                        NODE                DESIRED STATE       CURRENT STATE            ERROR               PORTS
gmoaopto84a4        ac_5820_0_V1.1   jdk8   test03     Running             Running 28 minutes ago                       *:1916->14129/tcp

but today, I find docker service ps name has no ports like this:

$ docker service ps ac_318_0_V1
ID                  NAME                           IMAGE                        NODE                DESIRED STATE       CURRENT STATE         ERROR               PORTS
tkrnvglpcjrw        ac_318_0_0_V1.1   jdk8   test03     Running             Running 8 hours ago

about 10+ services can't be access py ip:port. but when I run docker ps | grep name on worker node test03 ,I get :

$ docker ps | grep ac_318_0_0_V1
569d89352027        jdk8     "sh start.sh -m prod…"   9 days ago          Up 9 days           0.0.0.0:1069->14129/tcp         ac_318_0_0_V1.1.tkrnvglpcjrwxw2evnz7yqxk6

This is to say the task container work well on worker node, but manager can't get it state.

when I run docker logs name ,I get:

$ docker service logs ac_318_0_0_V1
error from daemon in stream: Error grabbing logs: rpc error: code = Unknown desc = warning: incomplete log stream. some logs could not be retrieved for the following reasons: node qk56punmxz06d62hn is not available

This is to say sometime a worker node is not available,but on that node I can't find some wrong. Why this manager can't get the port on the worker node? When I restart the service it works well. but I can't restart all these services manualy,help me please.

my docker version:

$ docker version
Client:
 Version:   17.12.1-ce
 API version:   1.35
 Go version:    go1.9.4
 Git commit:    7390fc6
 Built: Tue Feb 27 22:13:43 2018
 OS/Arch:   linux/amd64

Server:
 Engine:
  Version:  17.12.1-ce
  API version:  1.35 (minimum version 1.12)
  Go version:   go1.9.4
  Git commit:   7390fc6
  Built:    Tue Feb 27 22:20:43 2018
  OS/Arch:  linux/amd64
  Experimental: false

my docker info:

Containers: 45
 Running: 40
 Paused: 0
 Stopped: 5
Images: 39
Server Version: 17.12.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: 21grc3cvehywhuz9paqqc
 Is Manager: true
 ClusterID: kzop9zs677eg42xnr4801p
 Managers: 1
 Nodes: 3
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: ip
 Manager Addresses:
  ip
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9b55aab90508bd389d7654c4baf173a981477d55
runc version: 9f9c96235cc97674e935002fc3d78361b696a69e
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-693.21.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.51GiB
Name: test7
ID: X3N2:6EQZ:XALX:SPKU:2I2Q:SU2U:7BID:JHPH:QEE6:I2JZ:WIEU:YJMM
Docker Root Dir: /data/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
 containerslots=45
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
BaliStarDUT commented 6 years ago

I want to know When the manager get task info when you run docker service ps name ? Why It get task CURRENT STATE , bug didn't get PORTS info.

BaliStarDUT commented 6 years ago

And I find the docker log ,this is what happend that time ,after this I can't get ports from manager:

time="2018-05-05T22:02:14.440374317+08:00" level=error msg="agent: session failed" error="rpc error: code = DeadlineExceede
d desc = context deadline exceeded" module=node/agent node.id=qk56punmxz06d62hnjtsqdftl
time="2018-05-05T22:02:14.440829191+08:00" level=error msg="closing session after fatal error" error="rpc error: code = Int
ernal desc = transport is closing" module=node/agent node.id=qk56punmxz06d62hnjtsqdftl
time="2018-05-05T22:02:14.440879381+08:00" level=error msg="status reporter failed to report status to agent" error="rpc er
ror: code = Internal desc = transport is closing" module=node/agent node.id=qk56punmxz06d62hnjtsqdftl
nishanttotla commented 6 years ago

@BaliStarDUT can you please open this issue on github.com/docker/swarmkit?

Thanks.