contiv / netplugin

Container networking for various use cases
Apache License 2.0
514 stars 177 forks source link

B-series - Network created in controller node is not getting reflected in other nodes #543

Closed pradvara closed 8 years ago

pradvara commented 8 years ago

Imported the global variables and Commissioned the nodes:

The setup has three master and three worker nodes, the first master acts as the controller too.

clusterctl global set --extra-vars='{"env" : {"http_proxy": "http://proxy-wsa.esl.cisco.com:80", "https_proxy":"http://proxy-wsa.esl.cisco.com:80","no_proxy": "127.0.0.1,localhost,netmaster"}}'

clusterctl node commission contiv-b1-FCH1702J22M --extra-vars='{"control_interface": "enp6s0", "netplugin_if": "enp7s0", "service_vip": "10.106.240.121", "ucp_version":"1.1.2", "docker_version": "1.11.1", "validate_certs": "false", "scheduler_provider": "ucp-swarm", "ucp_bootstrap_node_name": "contiv-b1-FCH1702J22M", "ucp_license_file":"/home/stack/docker_subscription.lic", "ucp_license_dest":"/tmp/docker_subscription.lic"}' --host-group=service-master

clusterctl node commission contiv-b2-FCH1701J2KV --extra-vars='{"control_interface": "enp129s0f0", "netplugin_if": "enp129s0f1", "service_vip": "10.106.240.121", "ucp_version":"1.1.2", "docker_version": "1.11.1", "validate_certs": "false", "scheduler_provider": "ucp-swarm", "ucp_bootstrap_node_name": "contiv-b1-FCH1702J22M"}' --host-group=service-master

clusterctl node commission contiv-b3-FCH1828KBGQ --extra-vars='{"control_interface": "enp133s0", "netplugin_if": "enp6s0", "service_vip": "10.106.240.121", "ucp_version":"1.1.2", "docker_version": "1.11.1", "validate_certs": "false", "scheduler_provider": "ucp-swarm", "ucp_bootstrap_node_name": "contiv-b1-FCH1702J22M"}' --host-group=service-master

clusterctl node commission contiv-b4-FCH1811JLXV --extra-vars='{"control_interface": "enp133s0", "netplugin_if": "enp6s0", "service_vip": "10.106.240.121", "ucp_version":"1.1.2", "docker_version": "1.11.1", "validate_certs": "false", "scheduler_provider": "ucp-swarm", "ucp_bootstrap_node_name": "contiv-b1-FCH1702J22M"}' --host-group=service-worker

clusterctl node commission contiv-b5-FCH1834JF2M --extra-vars='{"control_interface": "enp133s0", "netplugin_if": "enp6s0", "service_vip": "10.106.240.121", "ucp_version":"1.1.2", "docker_version": "1.11.1", "validate_certs": "false", "scheduler_provider": "ucp-swarm", "ucp_bootstrap_node_name": "contiv-b1-FCH1702J22M"}' --host-group=service-worker

clusterctl node commission contiv-b6-FCH1811JD9C --extra-vars='{"control_interface": "enp133s0", "netplugin_if": "enp6s0", "service_vip": "10.106.240.121", "ucp_version":"1.1.2", "docker_version": "1.11.1", "validate_certs": "false", "scheduler_provider": "ucp-swarm", "ucp_bootstrap_node_name": "contiv-b1-FCH1702J22M"}' --host-group=service-worker

Set the Docker Host:

export DOCKER_TLS_VERIFY=1
export DOCKER_CERT_PATH="$(pwd)"
export DOCKER_HOST=tcp://10.106.240.108:443
gaurav-dalvi commented 8 years ago

Though its 6 node cluster, docker info command is only showing 2 nodes. @mapuri @vvb

vvb commented 8 years ago

@gaurav-dalvi @pradvara the node contiv-b3 is probably missing the latest cisco enic driver. The VIP 10.106.240.121 should be only on one of the master nodes.

[stack@contiv-b3 ~]$ ip a | grep "\.121"
    inet 10.106.240.121/32 scope global enp133s0_0
[stack@contiv-b3 ~]$

UCP failed to start with the below error,

INFO[0000] Unable to connect to 10.106.240.121:443: dial tcp 10.106.240.121:443: getsockopt: connection refused
FATA[0000] Post https://10.106.240.121:443/auth/login: dial tcp 10.106.240.121:443: getsockopt: connection refused
pradvara commented 8 years ago

@gaurav-dalvi @vvb let me try to update the enic driver on all the nodes

pradvara commented 8 years ago

@gaurav-dalvi @vvb : i updated the enic driver for that Blade(kmod-enic-2.3.0.20-rhel7u2.el7.x86_64.rpm)

I still see service VIP getting assigned to that node and UCP is failing to start

[stack@contiv-b3 ~]$ systemctl status ucp.service ● ucp.service - Ucp Loaded: loaded (/etc/systemd/system/ucp.service; static; vendor preset: disabled) Active: failed (Result: exit-code) since Wed 2016-08-31 22:56:29 IST; 32s ago Main PID: 15033 (code=exited, status=1/FAILURE)

Aug 31 22:56:29 contiv-b3 ucp.sh[17163]: ucp-auth-worker-data Aug 31 22:56:29 contiv-b3 ucp.sh[17163]: ucp-client-root-ca Aug 31 22:56:29 contiv-b3 ucp.sh[17163]: ucp-cluster-root-ca Aug 31 22:56:29 contiv-b3 ucp.sh[17163]: ucp-controller-client-certs Aug 31 22:56:29 contiv-b3 ucp.sh[17163]: ucp-controller-server-certs Aug 31 22:56:29 contiv-b3 ucp.sh[17163]: ucp-kv Aug 31 22:56:29 contiv-b3 ucp.sh[17163]: ucp-kv-certs Aug 31 22:56:29 contiv-b3 ucp.sh[17163]: ucp-node-certs Aug 31 22:56:29 contiv-b3 systemd[1]: Unit ucp.service entered failed state. Aug 31 22:56:29 contiv-b3 systemd[1]: ucp.service failed.

pradvara commented 8 years ago

Issue not seen with latest VNIC driver "kmod-enic-2.3.0.30-rhel7u2.el7.x86_64". All the nodes in cluster are getting detected

[stack@contiv-b1 ucp-bundle-admin]$ docker info
Containers: 47
 Running: 45
 Paused: 0
 Stopped: 2
Images: 80
Server Version: swarm/1.2.3
Role: primary
Strategy: spread
Filters: health, port, containerslots, dependency, affinity, constraint
Nodes: 6
 contiv-b1: 10.106.240.108:12376
  └ ID: ZTKV:22OG:WGLB:X646:EJLO:4CFZ:UPUU:E3A2:LKBY:KGWN:IJ5C:GYJH
  └ Status: Healthy
  └ Containers: 12
  └ Reserved CPUs: 0 / 33
  └ Reserved Memory: 0 B / 107.2 GiB
  └ Labels: executiondriver=, kernelversion=3.10.0-327.22.2.el7.x86_64, operatingsystem=Storage, storagedriver=devicemapper
  └ UpdatedAt: 2016-09-16T10:14:38Z
  └ ServerVersion: 1.11.1
 contiv-b2: 10.106.240.111:12376
  └ ID: IF65:LLRY:GJCQ:USO4:XFCA:UTSX:Y3SN:5LRB:BQDS:EEI2:BANB:6I2Z
  └ Status: Healthy
  └ Containers: 14
  └ Reserved CPUs: 0 / 25
  └ Reserved Memory: 0 B / 98.9 GiB
  └ Labels: executiondriver=, kernelversion=3.10.0-327.22.2.el7.x86_64, operatingsystem=Storage, storagedriver=devicemapper
  └ UpdatedAt: 2016-09-16T10:14:35Z
  └ ServerVersion: 1.11.1
 contiv-b3: 10.106.240.112:12376
  └ ID: W2MY:VPZN:7WZD:GMNM:NDU2:IDSJ:523Y:REJS:456X:75YS:LDWZ:65UK
  └ Status: Healthy
  └ Containers: 12
  └ Reserved CPUs: 0 / 8
  └ Reserved Memory: 0 B / 98.9 GiB
  └ Labels: executiondriver=, kernelversion=3.10.0-327.28.3.el7.x86_64, operatingsystem=Red Hat Enterprise Linux, storagedriver=devicemapper
  └ UpdatedAt: 2016-09-16T10:14:45Z
  └ ServerVersion: 1.11.1
 contiv-b4: 10.106.240.110:12376
  └ ID: GKIP:EC4B:X3Q7:YGJQ:WO3A:DF66:PC6Y:5QCM:3HPL:EZRX:SI6W:ZKCI
  └ Status: Healthy
  └ Containers: 3
  └ Reserved CPUs: 0 / 8
  └ Reserved Memory: 0 B / 115.4 GiB
  └ Labels: executiondriver=, kernelversion=3.10.0-327.22.2.el7.x86_64, operatingsystem=Storage, storagedriver=devicemapper
  └ UpdatedAt: 2016-09-16T10:15:12Z
  └ ServerVersion: 1.11.1
 contiv-b5: 10.106.240.109:12376
  └ ID: KLOJ:VIYP:Q3I5:QNSR:OWDQ:L6S3:LFLZ:2JF7:25HU:3WO6:ELMX:MTRX
  └ Status: Healthy
  └ Containers: 3
  └ Reserved CPUs: 0 / 8
  └ Reserved Memory: 0 B / 107.2 GiB
  └ Labels: executiondriver=, kernelversion=3.10.0-327.22.2.el7.x86_64, operatingsystem=Storage, storagedriver=devicemapper
  └ UpdatedAt: 2016-09-16T10:14:51Z
  └ ServerVersion: 1.11.1
 contiv-b6: 10.106.240.116:12376
  └ ID: IH5V:TLFD:Q4LW:FZQT:7NVX:BDYF:2YFD:N56Z:XVQQ:MQ3T:UZ6T:VFGU
  └ Status: Healthy
  └ Containers: 3
  └ Reserved CPUs: 0 / 8
  └ Reserved Memory: 0 B / 65.83 GiB
  └ Labels: executiondriver=, kernelversion=3.10.0-327.22.2.el7.x86_64, operatingsystem=Storage, storagedriver=devicemapper
  └ UpdatedAt: 2016-09-16T10:14:35Z
  └ ServerVersion: 1.11.1
Cluster Managers: 3
 10.106.240.108: Healthy
  └ Orca Controller: https://10.106.240.108:443
  └ Swarm Manager: tcp://10.106.240.108:2376
  └ KV: etcd://10.106.240.108:12379
 10.106.240.111: Healthy
  └ Orca Controller: https://10.106.240.111:443
  └ Swarm Manager: tcp://10.106.240.111:2376
  └ KV: etcd://10.106.240.111:12379
 10.106.240.112: Healthy
  └ Orca Controller: https://10.106.240.112:443
  └ Swarm Manager: tcp://10.106.240.112:2376
  └ KV: etcd://10.106.240.112:12379
Plugins:
 Volume:
 Network:
Kernel Version: 3.10.0-327.22.2.el7.x86_64
Operating System: linux
Architecture: amd64
CPUs: 90
Total Memory: 593.4 GiB
Name: ucp-controller-contiv-b1
ID: LTYN:X5ZL:MZLP:EJIO:N5SG:NAVJ:ZZXU:VFIS:5YS2:SEEC:F567:XNRT
Docker Root Dir:
Debug mode (client): false
Debug mode (server): false
WARNING: No kernel memory limit support
Labels:
 com.docker.ucp.license_key=IBuElytqSzSQ35i-ef5o80aupB2NmxBX6TJQVrsZ6Njq
 com.docker.ucp.license_max_engines=10
 com.docker.ucp.license_expires=2017-03-05 18:30:59 +0000 UTC
gaurav-dalvi commented 8 years ago

Thanks @pradvara . Closing this issue now .