gluster / gcs

Check github.com/heketi, github.com/gluster/gluster-containers, or github.com/kadalu/kadalu as active alternatives
https://gluster.org
Apache License 2.0
36 stars 24 forks source link

Problem faced while deploying GCS on OpenShift 3.11.* #92

Open cloudbehl opened 5 years ago

cloudbehl commented 5 years ago

GCS deployment on OpenShift

Prerequisites: Deployed on Openshift 3.11 - 4 node setup(1-master, 3- compute, 1-infra)

Inventory File for GCS used

node1 ansible_host=10.10.10.0

## List all the kube nodes that will form the GCS cluster
## Ensure that their hostnames are correct
node2 gcs_disks='["/dev/sdc"]'
node3 gcs_disks='["/dev/sdc"]'
node4 gcs_disks='["/dev/sdc"]'

[kube-master]
node1

[gcs-node]
node2
node3
node4
  1. A path to kubectl command in deploy-gcs.yml playbook is incorrect
-   kubectl: /usr/local/bin/kubectl
+   kubectl: kubectl
  1. Gcs-node instead of kube-node in deploy-gcs playbook for deploying gd2 pods because in examples/inventory-gcs-only.example file there is no kube-node tag.
-          until: peers_resp.status is defined and (peers_resp.status == 200 and peers_resp.json|length == groups['kube-node']|length)
+          until: peers_resp.status is defined and (peers_resp.status == 200 and peers_resp.json|length == groups['gcs-node']|length)
  1. [GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready] gets stuck.

I checked oc get events -n gcs and it gives below errors.

[GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready] fails because of 
gluster-node2-0 in StatefulSet gluster-node2 failed error: pods "gluster-node2-0" is forbidden: unable to validate against any security context constraint: [spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.volumes[2]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.volumes[3]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.volumes[4]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.volumes[5]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.containers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed]
10s       5m        26        gluster-node3.15741a9c47da5b39                    StatefulSet                                    Warning   FailedCreate        statefulset-controller           create Pod gluster-node3-0 in StatefulSet gluster-node3 failed error: pods "gluster-node3-0" is forbidden: unable to validate against any security context constraint: [spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.volumes[2]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.volumes[3]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.volumes[4]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.volumes[5]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.containers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed]
10s       5m        26        gluster-node4.15741a9c6a1e22bd                    StatefulSet                                    Warning   FailedCreate        statefulset-controller           create Pod gluster-node4-0 in StatefulSet gluster-node4 failed error: pods "gluster-node4-0" is forbidden: unable to validate against any security context constraint: [spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.volumes[2]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.volumes[3]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.volumes[4]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.volumes[5]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.containers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed]

Reason: This OpenShift cluster has security context constraint policies enabled that forbid any pod, without the explicitly set policy for the service account, to be allocated. So that's why it works in k8s, not in OCP

Solution :

oc create serviceaccount -n gcs gd2
oc adm policy add-scc-to-user privileged -ngcs -z gd2

and then add the following to the pod declaration: vi templates/gcs-manifests/gcs-gd2.yml.j2 serviceAccountName: gd2

Example

--- a/deploy/templates/gcs-manifests/gcs-gd2.yml.j2
+++ b/deploy/templates/gcs-manifests/gcs-gd2.yml.j2
@@ -22,6 +22,7 @@ spec:
         app.kubernetes.io/component: glusterfs
         app.kubernetes.io/name: glusterd2
     spec:
+      serviceAccountName: gd2
       affinity:
  1. TASK [GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready] is failing.
Error:
0s        47s       24        gluster-node2-0.15741cf057ee0f6e   Pod                 Warning   FailedScheduling   default-scheduler   0/4 nodes are available: 4 node(s) didn't match node selector.
[root@node1 ~]# oc get nodes
NAME      STATUS    ROLES     AGE       VERSION
node1     Ready     master    14d       v1.11.0+d4cacc0
node2     Ready     infra     14d       v1.11.0+d4cacc0
node3     Ready     compute   14d       v1.11.0+d4cacc0
node4     Ready     compute   14d       v1.11.0+d4cacc0

Gd2 pods expect node to be compute or master it doesn’t deploy on infra node. Initially, I had 1-master, 1-infra and 2-compute. So it expects gcs nodes to be compute or master.

Solution:

`oc edit node node2`
apiVersion: v1
kind: Node
metadata:
  annotations:
    node.openshift.io/md5sum: 5bf8b0f9773b36b55af3da0e40ec43d5
    volumes.kubernetes.io/controller-managed-attach-detach: "true"
  creationTimestamp: 2018-12-12T20:00:48Z
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/os: linux
    glusterfs: storage-host
    kubernetes.io/hostname: node2
    node-role.kubernetes.io/infra: "true"
    **node-role.kubernetes.io/compute: "true"**
  name: node2
  resourceVersion: "2188038"
  selfLink: /api/v1/nodes/node2
  uid: 9e7fc910-fe48-11e8-ae81-801844e013dc

Result:

[root@node1 deploy]# oc get nodes
NAME      STATUS    ROLES           AGE       VERSION
node1     Ready     master          14d       v1.11.0+d4cacc0
node2     Ready     compute,infra   14d       v1.11.0+d4cacc0
node3     Ready     compute         14d       v1.11.0+d4cacc0
node4     Ready     compute         14d       v1.11.0+d4cacc0

After the change is done:

Result : 0s 0s 1 gluster-node2-0.15741d62c298d738 Pod spec.containers{glusterd2} Normal Pulled kubelet, node2 Successfully pulled image "docker.io/gluster/glusterd2-nightly"

  1. TASK [GCS | CSI Driver | Wait for csi-provisioner to become available]

Error: fatal: [node1]: FAILED! => { "msg": "The conditional check 'result.stdout|int == groups['kube-node']|length' failed. The error was: error while evaluating conditional (result.stdout|int == groups['kube-node']|length): 'dict object' has no attribute 'kube-node'"

Solution:

vi deploy-gcs.yml

@@ -185,7 +185,7 @@
         - name: GCS | CSI Driver | Wait for csi-nodeplugin to become available
           command: "{{ kubectl }} -n{{ gcs_namespace }} -ojsonpath={.status.numberAvailable} get daemonset csi-nodeplugin-glusterfsplugin"
           register: result
-          until: result.stdout|int == groups['kube-node']|length
+          until: result.stdout|int == groups['gcs-node']|length
           delay: 10
           retries: 50

6.TASK [GCS | CSI Driver | Wait for csi-nodeplugin to become available]

Error:

oc get events -n gcs

0s        21s       13        csi-nodeplugin-glusterfsplugin.15741eb52fa45844   DaemonSet             Warning   FailedCreate   daemonset-controller   Error creating: pods "csi-nodeplugin-glusterfsplugin-" is forbidden: unable to validate against any security context constraint: [spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.volumes[2]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.containers[1].securityContext.privileged: Invalid value: true: Privileged containers are not allowed capabilities.add: Invalid value: "SYS_ADMIN": capability may not be added]
0s        41s       14        csi-nodeplugin-glusterfsplugin.15741eb52fa45844   DaemonSet             Warning   FailedCreate   daemonset-controller   Error creating: pods "csi-nodeplugin-glusterfsplugin-" is forbidden: unable to validate against any security context constraint: [spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.volumes[2]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.containers[1].securityContext.privileged: Invalid value: true: Privileged containers are not allowed capabilities.add: Invalid value: "SYS_ADMIN": capability may not be added]
0s        1m        15        csi-nodeplugin-glusterfsplugin.15741eb52fa45844   DaemonSet             Warning   FailedCreate   daemonset-controller   Error creating: pods "csi-nodeplugin-glusterfsplugin-" is forbidden: unable to validate against any security context constraint: [spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.volumes[2]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.containers[1].securityContext.privileged: Invalid value: true: Privileged containers are not allowed capabilities.add: Invalid value: "SYS_ADMIN": capability may not be added]
0s        2m        16        csi-nodeplugin-glusterfsplugin.15741eb52fa45844   DaemonSet             Warning   FailedCreate   daemonset-controller   Error creating: pods "csi-nodeplugin-glusterfsplugin-" is forbidden: unable to validate against any security context constraint: [spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.volumes[2]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.containers[1].securityContext.privileged: Invalid value: true: Privileged containers are not allowed capabilities.add: Invalid value: "SYS_ADMIN": capability may not be added]

Reason: We are creating the csi-nodeplugin but not adding it to the privileged user.

Solution: oc adm policy add-scc-to-user privileged -ngcs -z csi-nodeplugin

  1. GCS | Prometheus Operator | Wait for the Prometheus Operator to become ready

Error: 12s 1m 15 prometheus-operator-c4b75f7cd.15741ff38e3a2530 ReplicaSet Warning FailedCreate replicaset-controller Error creating: pods "prometheus-operator-c4b75f7cd-" is forbidden: unable to validate against any security context constraint: [spec.containers[0].securityContext.securityContext.runAsUser: Invalid value: 65534: must be in the ranges: [1000340000, 1000349999]] Solution: oc adm policy add-scc-to-user privileged -nmonitoring -z prometheus-operator

cloudbehl commented 5 years ago

reference: https://github.com/gluster/gcs/issues/46