Open rkharya opened 6 years ago
Looking at the attached logs contiv_install_01-22-2018.09-34-14.UTC.log and contiv_install_01-25-2018.05-56-47.UTC.log, I see failures when the contiv docker v2plugin was installed.
The following command failed on both master and worker nodes in the logs:
/usr/bin/docker plugin install --grant-all-permissions contiv/v2plugin:1.1.7 ctrl_ip=<IP> control_url=<IP>:9999 vxlan_port=8472 iflist=<interface> plugin_name=contiv/v2plugin:1.1.7 cluster_store=etcd://localhost:2379 plugin_role=[master|worker] fwd_mode=bridge
Can you send the logs in /var/log/contiv/
and /var/log/contiv*.log
from the master and worker nodes that saw this issue?
Worker node install failures - worker nodes don't have /var/log/contiv/ folder or any other contiv logs. So attaching logs from corresponding master nodes in the same cluster - contiv-master-logs-workerfailure.tar.gz
Master node intall failures - (as observed on 2nd cluter) - contiv-master-node-logs.tar.gz
in this case master nodes doesn't have netctl
installed, though netplugin booted up cleanly -
[root@DEE-Ctrl-1 contiv]# cat plugin_bootup.log 2018-01-22T09:41:03Z|00001|vlog|INFO|opened log file /var/log/contiv/ovs-db.log 2018-01-22T09:41:03Z|00001|vlog|INFO|opened log file /var/log/contiv/ovs-vswitchd.log Waiting for netmaster to be ready for connections Netmaster ready for connections, setting forward mode to bridge Forward mode is set n-if=eno6 -cluster-store=etcd://localhost:2379 -ctrl-ip=10.65.122.61 /netmaster -plugin-name=contiv/v2plugin:1.1.7 -cluster-mode=swarm-mode -cluster-store=etcd://localhost:2379 -control-url=10.65.122.61:9999
Also docker plugin ls
doesn't list Contiv -
[root@DEE-Ctrl-1 contiv]# docker plugin ls ID NAME DESCRIPTION ENABLED 631d379403b4 docker/telemetry:1.0.0.linux-x86_64-stable Docker Inc. metrics exporter false
@rkharya: Have you reproduced this on CentOS or on another distribution?
@unclejack: Reproducible on RHEL7.3 environments - BareMetal and BareMetal with VMs
Description
v2Plugin installation failures seen multiple times on 2 different setups. There are different error messages for the failure for Contiv master and Contiv worker nodes.
Expected Behavior
Contiv install should succeed on all Master/Worker Nodes w/o any errors.
Observed Behavior
Issue is being seen intermittently but can be stated for sure - After complete clean-up of the Docker Swarm cluster from Contiv bits, first iteration of installation fails then subsequent re-try eventually succeeds in installing Contiv. This behaviour is being seen only with the latest code-changes done some 20 days back on 1.1.7 release. We have not seen this issue during the CVD validation cycle till the CVD was released on Dec'18th, 2017.
Master Node install failures -
TASK [contiv_network : install v2plugin on master nodes] *** fatal: [node2]: FAILED! => {"changed": true, "cmd": "/usr/bin/docker plugin install --grant-all-permissions contiv/v2plugin:1.1.7 ctrl_ip=10.65.122.63 control_url=10.65.122.63:9999 vxlan_port=8472 iflist=eno6 plugin_name=contiv/v2plugin:1.1.7 cluster_store=etcd://localhost:2379 plugin_role=master fwd_mode=bridge", "delta": "0:06:11.601524", "end": "2018-01-22 15:11:25.034534", "failed": true, "rc": 1, "start": "2018-01-22 15:05:13.433010", "stderr": "Error response from daemon: dial unix /run/docker/plugins/330e5e6cb7025e7c40805912541ff706fad4d35eb4bb34b877ea5004dfcf8511/netplugin.sock: connect: connection refused", "stderr_lines": ["Error response from daemon: dial unix /run/docker/plugins/330e5e6cb7025e7c40805912541ff706fad4d35eb4bb34b877ea5004dfcf8511/netplugin.sock: connect: connection refused"], "stdout": "1.1.7: Pulling from contiv/v2plugin\n1ba3fc0d8c93: Verifying Checksum\n1ba3fc0d8c93: Download complete\nDigest: sha256:2b610546b385bcc46ca6c76a9be7fd859a3abf4b37f529ba9df41a4dc3853c30\nStatus: Downloaded newer image for contiv/v2plugin:1.1.7", "stdout_lines": ["1.1.7: Pulling from contiv/v2plugin", "1ba3fc0d8c93: Verifying Checksum", "1ba3fc0d8c93: Download complete", "Digest: sha256:2b610546b385bcc46ca6c76a9be7fd859a3abf4b37f529ba9df41a4dc3853c30", "Status: Downloaded newer image for contiv/v2plugin:1.1.7"]} fatal: [node1]: FAILED! => {"changed": true, "cmd": "/usr/bin/docker plugin install --grant-all-permissions contiv/v2plugin:1.1.7 ctrl_ip=10.65.122.61 control_url=10.65.122.61:9999 vxlan_port=8472 iflist=eno6 plugin_name=contiv/v2plugin:1.1.7 cluster_store=etcd://localhost:2379 plugin_role=master fwd_mode=bridge", "delta": "0:06:12.083192", "end": "2018-01-22 15:11:25.836960", "failed": true, "rc": 1, "start": "2018-01-22 15:05:13.753768", "stderr": "Error response from daemon: dial unix /run/docker/plugins/6f11c1b2fea19a72d9aa2ef95c0e85c224891f982826f815ff8a556dc640e48c/netplugin.sock: connect: no such file or directory", "stderr_lines": ["Error response from daemon: dial unix /run/docker/plugins/6f11c1b2fea19a72d9aa2ef95c0e85c224891f982826f815ff8a556dc640e48c/netplugin.sock: connect: no such file or directory"], "stdout": "1.1.7: Pulling from contiv/v2plugin\n1ba3fc0d8c93: Verifying Checksum\n1ba3fc0d8c93: Download complete\nDigest: sha256:2b610546b385bcc46ca6c76a9be7fd859a3abf4b37f529ba9df41a4dc3853c30\nStatus: Downloaded newer image for contiv/v2plugin:1.1.7", "stdout_lines": ["1.1.7: Pulling from contiv/v2plugin", "1ba3fc0d8c93: Verifying Checksum", "1ba3fc0d8c93: Download complete", "Digest: sha256:2b610546b385bcc46ca6c76a9be7fd859a3abf4b37f529ba9df41a4dc3853c30", "Status: Downloaded newer image for contiv/v2plugin:1.1.7"]} fatal: [node3]: FAILED! => {"changed": true, "cmd": "/usr/bin/docker plugin install --grant-all-permissions contiv/v2plugin:1.1.7 ctrl_ip=10.65.122.62 control_url=10.65.122.62:9999 vxlan_port=8472 iflist=eno6 plugin_name=contiv/v2plugin:1.1.7 cluster_store=etcd://localhost:2379 plugin_role=master fwd_mode=bridge", "delta": "0:06:12.404043", "end": "2018-01-22 15:11:25.136644", "failed": true, "rc": 1, "start": "2018-01-22 15:05:12.732601", "stderr": "Error response from daemon: dial unix /run/docker/plugins/9c15133fdbe9ee55f4054b0f3af7fbd9be9ae8efc0bfd72d70b791f3ecfb27fd/netplugin.sock: connect: no such file or directory", "stderr_lines": ["Error response from daemon: dial unix /run/docker/plugins/9c15133fdbe9ee55f4054b0f3af7fbd9be9ae8efc0bfd72d70b791f3ecfb27fd/netplugin.sock: connect: no such file or directory"], "stdout": "1.1.7: Pulling from contiv/v2plugin\n1ba3fc0d8c93: Verifying Checksum\n1ba3fc0d8c93: Download complete\nDigest: sha256:2b610546b385bcc46ca6c76a9be7fd859a3abf4b37f529ba9df41a4dc3853c30\nStatus: Downloaded newer image for contiv/v2plugin:1.1.7", "stdout_lines": ["1.1.7: Pulling from contiv/v2plugin", "1ba3fc0d8c93: Verifying Checksum", "1ba3fc0d8c93: Download complete", "Digest: sha256:2b610546b385bcc46ca6c76a9be7fd859a3abf4b37f529ba9df41a4dc3853c30", "Status: Downloaded newer image for contiv/v2plugin:1.1.7"]} to retry, use: --limit @/ansible/install_plays.retry
PLAY RECAP ***** node1 : ok=17 changed=9 unreachable=0 failed=1 node2 : ok=17 changed=9 unreachable=0 failed=1 node3 : ok=17 changed=9 unreachable=0 failed=1 node4 : ok=9 changed=4 unreachable=0 failed=0 node5 : ok=9 changed=4 unreachable=0 failed=0 node6 : ok=9 changed=4 unreachable=0 failed=0 node7 : ok=9 changed=4 unreachable=0 failed=0 node8 : ok=9 changed=4 unreachable=0 failed=0 node9 : ok=9 changed=4 unreachable=0 failed=0
Worker Node install failures -
TASK [contiv_network : install v2plugin on worker nodes] *** fatal: [node6]: FAILED! => {"changed": true, "cmd": "/usr/bin/docker plugin install --grant-all-permissions contiv/v2plugin:1.1.7 ctrl_ip=10.65.121.140 control_url=10.65.121.140:9999 vxlan_port=8472 iflist=ens192 plugin_name=contiv/v2plugin:1.1.7 cluster_store=etcd://localhost:2379 plugin_role=worker fwd_mode=bridge", "delta": "0:04:51.934836", "end": "2018-01-25 11:38:37.231374", "failed": true, "rc": 1, "start": "2018-01-25 11:33:45.296538", "stderr": "failed to download: unexpected EOF", "stderr_lines": ["failed to download: unexpected EOF"], "stdout": "1.1.7: Pulling from contiv/v2plugin", "stdout_lines": ["1.1.7: Pulling from contiv/v2plugin"]} fatal: [node7]: FAILED! => {"changed": true, "cmd": "/usr/bin/docker plugin install --grant-all-permissions contiv/v2plugin:1.1.7 ctrl_ip=10.65.121.141 control_url=10.65.121.141:9999 vxlan_port=8472 iflist=ens192 plugin_name=contiv/v2plugin:1.1.7 cluster_store=etcd://localhost:2379 plugin_role=worker fwd_mode=bridge", "delta": "0:04:52.343379", "end": "2018-01-25 11:38:44.770569", "failed": true, "rc": 1, "start": "2018-01-25 11:33:52.427190", "stderr": "failed to download: unexpected EOF", "stderr_lines": ["failed to download: unexpected EOF"], "stdout": "1.1.7: Pulling from contiv/v2plugin", "stdout_lines": ["1.1.7: Pulling from contiv/v2plugin"]} fatal: [node4]: FAILED! => {"changed": true, "cmd": "/usr/bin/docker plugin install --grant-all-permissions contiv/v2plugin:1.1.7 ctrl_ip=10.65.121.142 control_url=10.65.121.142:9999 vxlan_port=8472 iflist=ens192 plugin_name=contiv/v2plugin:1.1.7 cluster_store=etcd://localhost:2379 plugin_role=worker fwd_mode=bridge", "delta": "0:04:52.475222", "end": "2018-01-25 11:38:46.382501", "failed": true, "rc": 1, "start": "2018-01-25 11:33:53.907279", "stderr": "failed to download: unexpected EOF", "stderr_lines": ["failed to download: unexpected EOF"], "stdout": "1.1.7: Pulling from contiv/v2plugin", "stdout_lines": ["1.1.7: Pulling from contiv/v2plugin"]} fatal: [node8]: FAILED! => {"changed": true, "cmd": "/usr/bin/docker plugin install --grant-all-permissions contiv/v2plugin:1.1.7 ctrl_ip=10.65.121.130 control_url=10.65.121.130:9999 vxlan_port=8472 iflist=ens192 plugin_name=contiv/v2plugin:1.1.7 cluster_store=etcd://localhost:2379 plugin_role=worker fwd_mode=bridge", "delta": "0:04:54.685860", "end": "2018-01-25 11:38:48.099427", "failed": true, "rc": 1, "start": "2018-01-25 11:33:53.413567", "stderr": "failed to download: unexpected EOF", "stderr_lines": ["failed to download: unexpected EOF"], "stdout": "1.1.7: Pulling from contiv/v2plugin", "stdout_lines": ["1.1.7: Pulling from contiv/v2plugin"]} fatal: [node5]: FAILED! => {"changed": true, "cmd": "/usr/bin/docker plugin install --grant-all-permissions contiv/v2plugin:1.1.7 ctrl_ip=10.65.121.143 control_url=10.65.121.143:9999 vxlan_port=8472 iflist=ens192 plugin_name=contiv/v2plugin:1.1.7 cluster_store=etcd://localhost:2379 plugin_role=worker fwd_mode=bridge", "delta": "0:04:55.817107", "end": "2018-01-25 11:38:49.210135", "failed": true, "rc": 1, "start": "2018-01-25 11:33:53.393028", "stderr": "failed to download: unexpected EOF", "stderr_lines": ["failed to download: unexpected EOF"], "stdout": "1.1.7: Pulling from contiv/v2plugin", "stdout_lines": ["1.1.7: Pulling from contiv/v2plugin"]} fatal: [node12]: FAILED! => {"changed": true, "cmd": "/usr/bin/docker plugin install --grant-all-permissions contiv/v2plugin:1.1.7 ctrl_ip=10.65.121.129 control_url=10.65.121.129:9999 vxlan_port=8472 iflist=ens192 plugin_name=contiv/v2plugin:1.1.7 cluster_store=etcd://localhost:2379 plugin_role=worker fwd_mode=bridge", "delta": "0:01:54.202116", "end": "2018-01-25 11:40:35.330632", "failed": true, "rc": 1, "start": "2018-01-25 11:38:41.128516", "stderr": "failed to download: unexpected EOF", "stderr_lines": ["failed to download: unexpected EOF"], "stdout": "1.1.7: Pulling from contiv/v2plugin", "stdout_lines": ["1.1.7: Pulling from contiv/v2plugin"]} fatal: [node11]: FAILED! => {"changed": true, "cmd": "/usr/bin/docker plugin install --grant-all-permissions contiv/v2plugin:1.1.7 ctrl_ip=10.65.121.128 control_url=10.65.121.128:9999 vxlan_port=8472 iflist=ens192 plugin_name=contiv/v2plugin:1.1.7 cluster_store=etcd://localhost:2379 plugin_role=worker fwd_mode=bridge", "delta": "0:01:56.424311", "end": "2018-01-25 11:40:43.263658", "failed": true, "rc": 1, "start": "2018-01-25 11:38:46.839347", "stderr": "failed to download: unexpected EOF", "stderr_lines": ["failed to download: unexpected EOF"], "stdout": "1.1.7: Pulling from contiv/v2plugin", "stdout_lines": ["1.1.7: Pulling from contiv/v2plugin"]} fatal: [node9]: FAILED! => {"changed": true, "cmd": "/usr/bin/docker plugin install --grant-all-permissions contiv/v2plugin:1.1.7 ctrl_ip=10.65.121.124 control_url=10.65.121.124:9999 vxlan_port=8472 iflist=eno6 plugin_name=contiv/v2plugin:1.1.7 cluster_store=etcd://localhost:2379 plugin_role=worker fwd_mode=bridge", "delta": "0:02:54.790835", "end": "2018-01-25 11:41:46.656811", "failed": true, "rc": 1, "start": "2018-01-25 11:38:51.865976", "stderr": "failed to download: unexpected EOF", "stderr_lines": ["failed to download: unexpected EOF"], "stdout": "1.1.7: Pulling from contiv/v2plugin", "stdout_lines": ["1.1.7: Pulling from contiv/v2plugin"]} changed: [node10]