gluster / gluster-kubernetes

GlusterFS Native Storage Service for Kubernetes
Apache License 2.0
875 stars 389 forks source link

deploy fails when heketi adds devices to cluster #238

Closed Scukerman closed 7 years ago

Scukerman commented 7 years ago
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.1", GitCommit:"b0b7a323cc5a4a2019b2e9520c21c7830b7f708e", GitTreeState:"clean", BuildDate:"2017-04-03T20:44:38Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.1", GitCommit:"b0b7a323cc5a4a2019b2e9520c21c7830b7f708e", GitTreeState:"clean", BuildDate:"2017-04-03T20:33:27Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}

gluster-kubernetes: I used the latest release and master branch. I don't see the difference.

stdout log:

~/gluster-kubernetes/deploy$ ./gk-deploy -g -v -n glusterfs -l log.txt
Welcome to the deployment tool for GlusterFS on Kubernetes and OpenShift.

Before getting started, this script has some requirements of the execution
environment and of the container platform that you should verify.

The client machine that will run this script must have:
 * Administrative access to an existing Kubernetes or OpenShift cluster
 * Access to a python interpreter 'python'
 * Access to the heketi client 'heketi-cli'

Each of the nodes that will host GlusterFS must also have appropriate firewall
rules for the required GlusterFS ports:
 * 2222  - sshd (if running GlusterFS in a pod)
 * 24007 - GlusterFS Daemon
 * 24008 - GlusterFS Management
 * 49152 to 49251 - Each brick for every volume on the host requires its own
   port. For every new brick, one new port will be used starting at 49152. We
   recommend a default range of 49152-49251 on each host, though you can adjust
   this to fit your needs.

In addition, for an OpenShift deployment you must:
 * Have 'cluster_admin' role on the administrative account doing the deployment
 * Add the 'default' and 'router' Service Accounts to the 'privileged' SCC
 * Have a router deployed that is configured to allow apps to access services
   running in the cluster

Do you wish to proceed with deployment?

[Y]es, [N]o? [Default: Y]: 
Using Kubernetes CLI.
NAME        STATUS    AGE
glusterfs   Active    18h
Using namespace "glusterfs".
Checking that heketi pod is not running ... 
Checking status of pods matching 'glusterfs=heketi-pod':
No resources found.
Timed out waiting for pods matching 'glusterfs=heketi-pod'.
OK
serviceaccount "heketi-service-account" created
Marking 'black1' as a GlusterFS node.
node "black1" labeled
Marking 'black2' as a GlusterFS node.
node "black2" labeled
Marking 'black3' as a GlusterFS node.
node "black3" labeled
Deploying GlusterFS pods.
daemonset "glusterfs" created
Waiting for GlusterFS pods to start ... 
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-4bbfm   1/1       Running   0         1m
glusterfs-g23cs   1/1       Running   0         1m
glusterfs-h6sd7   1/1       Running   0         1m
OK
service "deploy-heketi" created
deployment "deploy-heketi" created
Waiting for deploy-heketi pod to start ... 
Checking status of pods matching 'glusterfs=heketi-pod':
deploy-heketi-3389103616-fl82k   1/1       Running   0         11s
OK
Determining heketi service URL ... OK
Creating cluster ... ID: e3f6fdc1f1d0660e98d1fcf18c304fc2
Creating node black1 ... ID: 88c8d8513d33747efcb677ae7e24703a
Adding device /dev/sdd ... Unable to add device: Failed to get list of pods
Creating node black2 ... Unable to create node: Failed to get list of pods
Creating node black3 ... Unable to create node: Failed to get list of pods
Error loading the cluster topology.
Please check the failed node or device and rerun this script using the --load option.

My topology.json:

{
    "clusters": [{
        "nodes": [
            {
                "node": {"hostnames": {"manage": ["black1"], "storage": ["172.100.100.1"]}, "zone": 1},
                "devices": ["/dev/sdd"]
            },
            {
                "node": {"hostnames": {"manage": ["black2"], "storage": ["172.100.100.2"]}, "zone": 1},
                "devices": ["/dev/sdd"]
            },
            {
                "node": {"hostnames": {"manage": ["black3"], "storage": ["172.100.100.3"]}, "zone": 1},
                "devices": ["/dev/sdd"]
            }
        ]
    }]
}
obnoxxx commented 7 years ago

Right, this is due to changes in kube 1.6.

PR #236 fixes this.

Scukerman commented 7 years ago

@obnoxxx oh, thx. I checked issues and PRs yesterday and haven't seen this.

Update: I'll check it and close the issue if it works for me.

obnoxxx commented 7 years ago

@Scukerman I figured it out last night ... :-)

There is also an updated vagrant-based test environment in PR #227 (update to kube 1.6.1).

PR #225 adds tests to run in the vagrant env (essentially implementing the quickstart-guide and dynamic provisioning example). The gk-deploy succeeds with the addition of #236, but I am still having problems with the pvc creation: pvc remains in pending state and the request never reaches heketi. Any insights appreciated...

Scukerman commented 7 years ago

It worked like a charm! Thx a lot, @obnoxxx. I was thinking about clusterrolebinding but I'm not as smart as you at kubernetes. P.S. I used kubeadm to build up a cluster. P.P.S. master branch (commit 3c154c608135f9c0878ade246acd5b32053da7e0) + PR #236

Scukerman commented 7 years ago

@obnoxxx after deploying I ran into what you said. heketi-cli topology info outputs nothing. I loaded topology one more time and now I can see the cluster, but devices weren't added because of

Adding device /dev/sdd ... Unable to add device: Unable to execute command on glusterfs-6jdcf:   Can't open /dev/sdd exclusively.  Mounted filesystem?

I can't remove vgs because they are mounted and in use in gluster pods.

Scukerman commented 7 years ago

Yeah! I made it! I wiped devices and reloaded topology.

$ kubectl get pvc
NAME          STATUS    VOLUME                                     CAPACITY   ACCESSMODES   STORAGECLASS   AGE
hello-world   Bound     pvc-b4875952-1ad6-11e7-b35d-0cc47ac5569a   5Gi        RWO           glusterfs      11s
obnoxxx commented 7 years ago

@Scukerman, exactly you need to wipe the devices (or the whole vms...)

er, ... how did you get the hello-world into bound state? This is the thing I am currently struggling with.. Could paste your pvc yaml file, please?

Scukerman commented 7 years ago

@obnoxxx I don't know. I got my topology info working and just deployed this manifest:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: hello-world
  annotations:
    volume.beta.kubernetes.io/storage-class: glusterfs
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
obnoxxx commented 7 years ago

@Scukerman thanks. THis does not look special... will test again. it gives me hope ;-)

obnoxxx commented 7 years ago

Oh, and could you also show your storage class (glusterfs)? @Scukerman

Scukerman commented 7 years ago

@obnoxxx, it looks pretty standard.

apiVersion: storage.k8s.io/v1beta1
kind: StorageClass
metadata:
  name: glusterfs
provisioner: kubernetes.io/glusterfs
parameters:
  resturl: "http://10.105.47.226:8080"

Update: removed extra info. Nevermind, it all works

obnoxxx commented 7 years ago

@Scukerman indeed, my storageclass had an error: it mentioned endpoint alongside resturl, which is wrong. But in kube versions < 1.6 this was not a problem... After i removed endpoint, it worked like a charm...

Scukerman commented 7 years ago

@obnoxxx according to this https://github.com/gluster/gluster-kubernetes/blob/master/docs/examples/hello_world/README.md it is not necessary anymore. And, like you said, it is a problem for 1.6 now.

P.S. Didn't know. I'm using gluster since k8s 1.5 I guess. P.P.S. Anyways, you're welcome :)

Liangming666 commented 6 years ago

@Scukerman Hi,Scukerman,I meet the same error as you have before dding device /dev/sdd ... Unable to add device: Unable to execute command on glusterfs-6jdcf: Can't open /dev/sdd exclusively. Mounted filesystem?

I use the USB driver as the glusterfs device,How did you clear the error?

jarrpa commented 6 years ago

@Liangming666 The block device must be bare, meaning that it is not mounted, has no filesystem, no partitions, and no LVM volume data. The easiest way to ensure this is to unmount any volumes from the device and then do wipefs -a /dev/sdd (or whatever name your device has).

simisoz commented 6 years ago

If you can't wipefs the device, remove it and run a rescan. see here
$lsscsi [2:2:0:0] disk DELL PERC H700 2.10 /dev/sda [2:2:1:0] disk DELL PERC H700 2.10 /dev/sdc $ echo 1 > /sys/class/scsi_device/2\:2\:1\:0/device/delete $ echo "- - -" > /sys/class/scsi_host/hostX/scan Note: where X in 0,1,2,3,4 ....