docker-archive / classicswarm

Swarm Classic: a container clustering system. Not to be confused with Docker Swarm which is at
Apache License 2.0
5.75k stars 1.08k forks source link

Swarm manager can't get task ports, but actual task on worker has ports. #2861

Closed BaliStarDUT closed 6 years ago

BaliStarDUT commented 6 years ago

My swarm cluster has 1 manager and 2 worker. Running 100+ services on the cluster , and expose different ports outside. it works well last 2 weeks.

When run docker service ps name It should be:

$ docker service ps ac_5820_0_V1
ID                  NAME                         IMAGE                        NODE                DESIRED STATE       CURRENT STATE            ERROR               PORTS
gmoaopto84a4        ac_5820_0_V1.1   jdk8   test03     Running             Running 28 minutes ago                       *:1916->14129/tcp

but today, I find docker service ps name has no ports like this:

$ docker service ps ac_318_0_V1
ID                  NAME                           IMAGE                        NODE                DESIRED STATE       CURRENT STATE         ERROR               PORTS
tkrnvglpcjrw        ac_318_0_0_V1.1   jdk8   test03     Running             Running 8 hours ago

about 10+ services can't be access py ip:port. but when I run docker ps | grep name on worker node test03 ,I get :

$ docker ps | grep ac_318_0_0_V1
569d89352027        jdk8     "sh -m prod…"   9 days ago          Up 9 days >14129/tcp         ac_318_0_0_V1.1.tkrnvglpcjrwxw2evnz7yqxk6

This is to say the task container work well on worker node, but manager can't get it state.

when I run docker logs name ,I get:

$ docker service logs ac_318_0_0_V1
error from daemon in stream: Error grabbing logs: rpc error: code = Unknown desc = warning: incomplete log stream. some logs could not be retrieved for the following reasons: node qk56punmxz06d62hn is not available

This is to say sometime a worker node is not available,but on that node I can't find some wrong. Why this manager can't get the port on the worker node? When I restart the service it works well. but I can't restart all these services manualy,help me please.

my docker version:

$ docker version
 Version:   17.12.1-ce
 API version:   1.35
 Go version:    go1.9.4
 Git commit:    7390fc6
 Built: Tue Feb 27 22:13:43 2018
 OS/Arch:   linux/amd64

  Version:  17.12.1-ce
  API version:  1.35 (minimum version 1.12)
  Go version:   go1.9.4
  Git commit:   7390fc6
  Built:    Tue Feb 27 22:20:43 2018
  OS/Arch:  linux/amd64
  Experimental: false

my docker info:

Containers: 45
 Running: 40
 Paused: 0
 Stopped: 5
Images: 39
Server Version: 17.12.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: 21grc3cvehywhuz9paqqc
 Is Manager: true
 ClusterID: kzop9zs677eg42xnr4801p
 Managers: 1
 Nodes: 3
  Task History Retention Limit: 5
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: ip
 Manager Addresses:
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9b55aab90508bd389d7654c4baf173a981477d55
runc version: 9f9c96235cc97674e935002fc3d78361b696a69e
init version: 949e6fa
Security Options:
  Profile: default
Kernel Version: 3.10.0-693.21.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.51GiB
Name: test7
Docker Root Dir: /data/docker
Debug Mode (client): false
Debug Mode (server): false
Experimental: false
Insecure Registries:
Live Restore Enabled: false
BaliStarDUT commented 6 years ago

I want to know When the manager get task info when you run docker service ps name ? Why It get task CURRENT STATE , bug didn't get PORTS info.

BaliStarDUT commented 6 years ago

And I find the docker log ,this is what happend that time ,after this I can't get ports from manager:

time="2018-05-05T22:02:14.440374317+08:00" level=error msg="agent: session failed" error="rpc error: code = DeadlineExceede
d desc = context deadline exceeded" module=node/agent
time="2018-05-05T22:02:14.440829191+08:00" level=error msg="closing session after fatal error" error="rpc error: code = Int
ernal desc = transport is closing" module=node/agent
time="2018-05-05T22:02:14.440879381+08:00" level=error msg="status reporter failed to report status to agent" error="rpc er
ror: code = Internal desc = transport is closing" module=node/agent
nishanttotla commented 6 years ago

@BaliStarDUT can you please open this issue on
