aokabin / rancher-practice

0 stars 0 forks source link

クリーンインストールしたCentOS7をKubernetesNodeにする #3

Closed aokabin closed 5 years ago

aokabin commented 5 years ago

研究室のマシンを1台バックアップして、CentOSをクリーンインストールする

aokabin commented 5 years ago

sudo yum install docker-ce-18.09.1-3.el7

   24  sudo yum update -y
   25  sudo yum install -y yum-utils device-mapper-persistent-data lvm2
   26  sudo yum-config-manager     --add-repo     https://download.docker.com/linux/centos/docker-ce.repo
   27  sudo yum-config-manager --disable docker-ce-edge
   28  sudo yum makecache fast
   29  yum list docker-ce --showduplicates | sort -r
   30  sudo yum install docker-ce-3:18.09.1-3.el7
   31  sudo yum install docker-ce-18.09.1-3.el7
   32  sudo systemctl enable docker
   33  sudo systemctl start docker

こちらを参考に

aokabin commented 5 years ago

動きましたね!!

問題はkumanomi こっちもallで設定してみる firewalldでポートに接続できないと怒られたので、接続できるように解放する

# firewall-cmd --zone=public --add-port=2379/tcp --permanent
success
# firewall-cmd --zone=public --add-port=2380/tcp --permanent
success
# firewall-cmd --reload
success
# firewall-cmd --list-all
public (active)
  target: default
  icmp-block-inversion: no
  interfaces: eno1
  sources:
  services: ssh dhcpv6-client
  ports: 2379/tcp 2380/tcp
  protocols:
  masquerade: no
  forward-ports:
  source-ports:
  icmp-blocks:
  rich rules:
aokabin commented 5 years ago

ダメだったので学科のVMで検証

CentOSとUbuntuを3台ずつ用意 それぞれ

まずはCentOSのセットアップをこちらを参考に

aokabin commented 5 years ago
sudo yum -y update
sudo yum install -y yum-utils device-mapper-persistent-data lvm2
sudo yum-config-manager \
    --add-repo \
    https://download.docker.com/linux/centos/docker-ce.repo
sudo yum makecache fast
yum list docker-ce --showduplicates | sort -r
sudo yum -y install docker-ce-18.09.1-3.el7
sudo systemctl start docker
sudo systemctl enable docker
sudo docker run hello-world

同時並行でUbuntuも

OSのバージョンとCPU情報が必要らしいので、以下を参考に

cat /etc/os-release
uname -a

今回はamd64っぽかった

sudo apt-get -y update
sudo apt-get -y install \
    apt-transport-https \
    ca-certificates \
    curl \
    software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo apt-key fingerprint 0EBFCD88 # 9DC8 5822 9FC7 DD38 854A E2D8 8D81 803C 0EBF CD88の表示を確認
sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"
sudo apt-get -y update
apt-cache madison docker-ce
sudo apt-get -y install docker-ce=5:18.09.1~3-0~ubuntu-xenial

自動起動の仕方がちょっと違うみたい あと、言語設定を日本語に修正しないと、perlのwarningが出る

sudo apt install language-pack-ja-base language-pack-ja ibus-mozc
sudo localectl set-locale LANG=ja_JP.UTF-8 LANGUAGE="ja_JP:ja"
exit
ssh # 再度sshログイン
sudo apt-get -y install sysv-rc-conf
sudo sysv-rc-conf docker on
sudo docker run hello-world
aokabin commented 5 years ago

両方Workerとして追加してみた! Workerとして接続できない

Nodeのagentのエラーを確認してみたところ

time="2019-01-27T12:20:40Z" level=error msg="Failed to connect to proxy" error="read tcp 10.0.3.24:46055->10.0.2.228:443: i/o timeout"
time="2019-01-27T12:20:50Z" level=info msg="Connecting to wss://kabilab.st.ie.u-ryukyu.ac.jp/v3/connect with token pmdd4h2chk8cb6g8629c4nsbdt284zk4jx8vkcknqhds5k5gchl978"
time="2019-01-27T12:20:50Z" level=info msg="Connecting to proxy" url="wss://kabilab.st.ie.u-ryukyu.ac.jp/v3/connect"
time="2019-01-27T12:21:00Z" level=info msg="Starting plan monitor"

というエラーが この10.0.2.228というのは、rancherのエラー、https接続したところtimeoutしている様子 試しに接続しようとしてみた

curl https://10.0.2.228 -k

{"type":"collection","links":{"self":"https://10.0.2.228/"},"actions":{},"pagination":{"limit":1000,"total":4},"sort":{"order":"asc","reverse":"https://10.0.2.228/?order=desc"},"resourceType":"apiRoot","data":[{"apiVersion":{"group":"meta.cattle.io","path":"/meta","version":"v1"},"baseType":"apiRoot","links":{"apiRoots":"https://10.0.2.228/meta/apiroots","root":"https://10.0.2.228/meta","schemas":"https://10.0.2.228/meta/schemas","self":"https://10.0.2.228/meta","subscribe":"https://10.0.2.228/meta/subscribe"},"type":"apiRoot"},{"apiVersion":{"group":"management.cattle.io","path":"/v3","version":"v3"},"baseType":"apiRoot","links":{"authConfigs":"https://10.0.2.228/v3/authconfigs","catalogs":"https://10.0.2.228/v3/catalogs","clusterAlerts":"https://10.0.2.228/v3/clusteralerts","clusterEvents":"https://10.0.2.228/v3/clusterevents","clusterLoggings":"https://10.0.2.228/v3/clusterloggings","clusterRegistrationTokens":"https://10.0.2.228/v3/clusterregistrationtokens","clusterRoleTemplateBindings":"https://10.0.2.228/v3/clusterroletemplatebindings","clusters":"https://10.0.2.228/v3/clusters","composeConfigs":"https://10.0.2.228/v3/composeconfigs","dynamicSchemas":"https://10.0.2.228/v3/dynamicschemas","globalRoleBindings":"https://10.0.2.228/v3/globalrolebindings","globalRoles":"https://10.0.2.228/v3/globalroles","groupMembers":"https://10.0.2.228/v3/groupmembers","groups":"https://10.0.2.228/v3/groups","ldapConfigs":"https://10.0.2.228/v3/ldapconfigs","listenConfigs":"https://10.0.2.228/v3/listenconfigs","nodeDrivers":"https://10.0.2.228/v3/nodedrivers","nodePools":"https://10.0.2.228/v3/nodepools","nodeTemplates":"https://10.0.2.228/v3/nodetemplates","nodes":"https://10.0.2.228/v3/nodes","notifiers":"https://10.0.2.228/v3/notifiers","podSecurityPolicyTemplateProjectBindings":"https://10.0.2.228/v3/podsecuritypolicytemplateprojectbindings","podSecurityPolicyTemplates":"https://10.0.2.228/v3/podsecuritypolicytemplates","preferences":"https://10.0.2.228/v3/preferences","principals":"https://10.0.2.228/v3/principals","projectAlerts":"https://10.0.2.228/v3/projectalerts","projectLoggings":"https://10.0.2.228/v3/projectloggings","projectNetworkPolicies":"https://10.0.2.228/v3/projectnetworkpolicies","projectRoleTemplateBindings":"https://10.0.2.228/v3/projectroletemplatebindings","projects":"https://10.0.2.228/v3/projects","roleTemplates":"https://10.0.2.228/v3/roletemplates","root":"https://10.0.2.228/v3","self":"https://10.0.2.228/v3","settings":"https://10.0.2.228/v3/settings","subscribe":"https://10.0.2.228/v3/subscribe","templateContents":"https://10.0.2.228/v3/templatecontents","templateVersions":"https://10.0.2.228/v3/templateversions","templates":"https://10.0.2.228/v3/templates","tokens":"https://10.0.2.228/v3/tokens","users":"https://10.0.2.228/v3/users"},"type":"apiRoot"},{"apiVersion":{"group":"cluster.cattle.io","path":"/v3/cluster","version":"v3"},"baseType":"apiRoot","links":{"namespaces":"https://10.0.2.228/v3/cluster/namespaces","persistentVolumes":"https://10.0.2.228/v3/cluster/persistentvolumes","root":"https://10.0.2.228/v3/cluster","self":"https://10.0.2.228/v3/cluster","storageClasses":"https://10.0.2.228/v3/cluster/storageclasses","subscribe":"https://10.0.2.228/v3/cluster/subscribe"},"type":"apiRoot"},{"apiVersion":{"group":"project.cattle.io","path":"/v3/project","version":"v3"},"baseType":"apiRoot","links":{"appRevisions":"https://10.0.2.228/v3/project/apprevisions","apps":"https://10.0.2.228/v3/project/apps","basicAuths":"https://10.0.2.228/v3/project/basicauths","certificates":"https://10.0.2.228/v3/project/certificates","configMaps":"https://10.0.2.228/v3/project/configmaps","cronJobs":"https://10.0.2.228/v3/project/cronjobs","daemonSets":"https://10.0.2.228/v3/project/daemonsets","deployments":"https://10.0.2.228/v3/project/deployments","dnsRecords":"https://10.0.2.228/v3/project/dnsrecords","dockerCredentials":"https://10.0.2.228/v3/project/dockercredentials","ingresses":"https://10.0.2.228/v3/project/ingresses","jobs":"https://10.0.2.228/v3/project/jobs","namespacedBasicAuths":"https://10.0.2.228/v3/project/namespacedbasicauths","namespacedCertificates":"https://10.0.2.228/v3/project/namespacedcertificates","namespacedDockerCredentials":"https://10.0.2.228/v3/project/namespaceddockercredentials","namespacedSecrets":"https://10.0.2.228/v3/project/namespacedsecrets","namespacedServiceAccountTokens":"https://10.0.2.228/v3/project/namespacedserviceaccounttokens","namespacedSshAuths":"https://10.0.2.228/v3/project/namespacedsshauths","persistentVolumeClaims":"https://10.0.2.228/v3/project/persistentvolumeclaims","pipelineExecutions":"https://10.0.2.228/v3/project/pipelineexecutions","pipelineSettings":"https://10.0.2.228/v3/project/pipelinesettings","pipelines":"https://10.0.2.228/v3/project/pipelines","pods":"https://10.0.2.228/v3/project/pods","replicaSets":"https://10.0.2.228/v3/project/replicasets","replicationControllers":"https://10.0.2.228/v3/project/replicationcontrollers","root":"https://10.0.2.228/v3/project","secrets":"https://10.0.2.228/v3/project/secrets","self":"https://10.0.2.228/v3/project","serviceAccountTokens":"https://10.0.2.228/v3/project/serviceaccounttokens","services":"https://10.0.2.228/v3/project/services","sourceCodeCredentials":"https://10.0.2.228/v3/project/sourcecodecredentials","sourceCodeProviderConfigs":"https://10.0.2.228/v3/project/sourcecodeproviderconfigs","sourceCodeProviders":"https://10.0.2.228/v3/project/sourcecodeproviders","sourceCodeRepositories":"https://10.0.2.228/v3/project/sourcecoderepositories","sshAuths":"https://10.0.2.228/v3/project/sshauths","statefulSets":"https://10.0.2.228/v3/project/statefulsets","subscribe":"https://10.0.2.228/v3/project/subscribe","workloads":"https://10.0.2.228/v3/project/workloads"},"type":"apiRoot"}]}

curl https://10.0.2.228/v3/connect -k
failed authentication

ふむ...接続はできるんだけどな...timeout、なぜ...

aokabin commented 5 years ago

ここでipアドレスの確認をしておく

10.0.2.228: rancher
10.0.3.65:  anago
10.0.3.5:   centos
10.0.3.8:   ubuntu
10.0.3.24:  centos2
10.0.3.26:  ubuntu2
10.0.3.25:  centos3
10.0.3.28:  ubuntu3
aokabin commented 5 years ago

rancherみてみたら、めっちゃアクセスきてる

E0126 09:48:35.310407       5 reflector.go:205] github.com/rancher/rancher/vendor/github.com/rancher/norman/controller/generic_controller.go:144: Failed to list *v1.Pod: Get https://10.0.3.65:6443/api/v1/pods?limit=500&resourceVersion=0&timeout=30s: read tcp 10.42.0.5:50946->10.0.3.65:6443: read: connection reset by peer

この10.42.0.5ってのは、多分flannelのアドレス

35: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
    link/ether da:94:03:5e:f6:81 brd ff:ff:ff:ff:ff:ff
    inet 10.42.0.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::d894:3ff:fe5e:f681/64 scope link
       valid_lft forever preferred_lft forever

まぁこれはあまり関係ないか... etcd込みで起動したらいけるのかな?

aokabin commented 5 years ago

なんかネットワークに関係しそうなのが出てきた

まず、anagoもちょっと足りてなかったので 上記に加えて再設定

まずはetcdの設定 2019-01-27 21 44 02

firewall-cmd --zone=public --add-port=2376/tcp --permanent
firewall-cmd --zone=public --add-port=2379/tcp --permanent # 上記で設定済み
firewall-cmd --zone=public --add-port=2380/tcp --permanent # 上記で設定済み
firewall-cmd --zone=public --add-port=8472/tcp --permanent
firewall-cmd --zone=public --add-port=9099/tcp --permanent
firewall-cmd --zone=public --add-port=10250/tcp --permanent

次にcontrolplaneの設定 2019-01-27 21 49 40

firewall-cmd --zone=public --add-port=80/tcp --permanent
firewall-cmd --zone=public --add-port=443/tcp --permanent
firewall-cmd --zone=public --add-port=6443/tcp --permanent
firewall-cmd --zone=public --add-port=8472/tcp --permanent
firewall-cmd --zone=public --add-port=10254/tcp --permanent
firewall-cmd --zone=public --add-port=30000-32767/tcp --permanent
firewall-cmd --reload
firewall-cmd --list-all

やってみる

aokabin commented 5 years ago

とりあえず、成功はしたっぽい

一応、rancherVMもちゃんと開けておく

firewall-cmd --zone=public --add-port=80/tcp --permanent
firewall-cmd --zone=public --add-port=443/tcp --permanent
aokabin commented 5 years ago

次にworkerの方を開ける

firewall-cmd --zone=public --add-port=80/tcp --permanent
firewall-cmd --zone=public --add-port=443/tcp --permanent
firewall-cmd --zone=public --add-port=2376/tcp --permanent
firewall-cmd --zone=public --add-port=8472/tcp --permanent
firewall-cmd --zone=public --add-port=9099/tcp --permanent
firewall-cmd --zone=public --add-port=10250/tcp --permanent
firewall-cmd --zone=public --add-port=10254/tcp --permanent
firewall-cmd --zone=public --add-port=30000-32767/tcp --permanent
firewall-cmd --reload
firewall-cmd --list-all

開けてみた、もう一度nodeを登録しなおしてみる

aokabin commented 5 years ago

こんなエラーが

Failed to connect to proxy" error="websocket: bad handshake

色々あって、rancherのSELinux云々の話が見え隠れしてきた 一応その通りにやってみようと思う

一旦VMのrancherを落とす

aokabin commented 5 years ago

anagoはSELinuxはoffってた気がするので、そのまま行く

[controlPlane] Failed to bring up Control Plane: Failed to verify healthcheck: Failed to check https://localhost:6443/healthz for service [kube-apiserver] on host [10.0.3.65]: Get https://localhost:6443/healthz: read tcp [::1]:50724->[::1]:6443: read: connection reset by peer, log: I0127 13:34:29.039201 1 plugins.go:161] Loaded 6 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,ValidatingAdmissionWebhook,ResourceQuota.

えぇ。。。なにこれ。。。

aokabin commented 5 years ago

またanagoでもkube-apiserverが落ちまくる問題が生じてきた...

初期化しろ的な事を言っている?

ここを参考に、anagoから色々消してみる

消したのは

くらい

おお、治った!

aokabin commented 5 years ago

やったのは

docker rm -f $(docker ps -qa)
docker rmi -f $(docker images -q)
docker volume rm $(docker volume ls -q)

for mount in $(mount | grep tmpfs | grep '/var/lib/kubelet' | awk '{ print $3 }') /var/lib/kubelet /var/lib/rancher; do umount $mount; done

rm -rf /etc/ceph \
       /etc/cni \
       /etc/kubernetes \
       /opt/cni \
       /opt/rke \
       /run/secrets/kubernetes.io \
       /run/calico \
       /run/flannel \
       /var/lib/calico \
       /var/lib/etcd \
       /var/lib/cni \
       /var/lib/kubelet \
       /var/lib/rancher/rke/log \
       /var/log/containers \
       /var/log/pods \
       /var/run/calico

こんな感じ

aokabin commented 5 years ago

CentOSたちにも上記を試して、SELinuxのための:Zを付加した上でもう一度agentを動かしてみる 間違えてubuntuにやってた、そして動いていた!

aokabin commented 5 years ago

CentOSも動いた!念の為記録を残しておくと

ubuntuは:Zつき

sudo docker run -d --privileged --restart=unless-stopped --net=host -v /etc/kubernetes:/etc/kubernetes:Z -v /var/run:/var/run:Z rancher/rancher-agent:v2.1.5 --server https://kabilab.st.ie.u-ryukyu.ac.jp --token lzcs6mnz9xvlvp58s8k8zwkczgqrdhdlrtw5ktg5stqb9jtx6rcrd6 --ca-checksum 1a680117bf605da4e6624861a6c682dde00b0ee78878520c55735160a11aea1b --worker

CentOSは:Zなし

sudo docker run -d --privileged --restart=unless-stopped --net=host -v /etc/kubernetes:/etc/kubernetes -v /var/run:/var/run rancher/rancher-agent:v2.1.5 --server https://kabilab.st.ie.u-ryukyu.ac.jp --token lzcs6mnz9xvlvp58s8k8zwkczgqrdhdlrtw5ktg5stqb9jtx6rcrd6 --ca-checksum 1a680117bf605da4e6624861a6c682dde00b0ee78878520c55735160a11aea1b --worker

でした(多分どっちでもよかったのだと思う)

思うに、anagoのresetが功を奏している気がする、今ならkumanomiもいけるのでは? やってみよう

aokabin commented 5 years ago

ちょっと再起動処理が怖かったので、明日にします

aokabin commented 5 years ago

kumanomiも、ちゃんとfirewallの設定しておきたいよね そしてreset処理

aokabin commented 5 years ago

kumanomiも繋がった!reset処理がとても重要である事を認識した...!