gluster / gluster-kubernetes

GlusterFS Native Storage Service for Kubernetes
Apache License 2.0
874 stars 390 forks source link

Unable to deploy on Ubuntu 18.04 -> pods not found. #635

Open ghost opened 4 years ago

ghost commented 4 years ago

Hello,

im quite new to K8s and rancher but i need some kind of global available storage, therefor i tried out this project but everytime i try to deploy the glusterfs I get the following error at the dashborad:

Readiness probe failed: /usr/local/bin/status-probe.sh failed check: systemctl -q is-active glusterd.service

and the deployment never went trough sucessefully.

The log output of the pod locks like this:

23.1.2020 19:15:20maximum number of pids configured in cgroups: max 23.1.2020 19:15:20maximum number of pids configured in cgroups (reconfigured): max 23.1.2020 19:15:20env variable is set. Update in gluster-blockd.service

and basically thats it ...

this is what my topology look like:

{
  "clusters": [
    {
      "nodes": [
        {
          "node": {
            "hostnames": {
              "manage": [
                "worker1"
              ],
              "storage": [
                "192.168.40.151"
              ]
            },
            "zone": 1
          },
          "devices": [
            "/dev/vdb"
          ]
        },
        {
          "node": {
            "hostnames": {
              "manage": [
                "worker2"
              ],
              "storage": [
                "192.168.40.152"
              ]
            },
            "zone": 1
          },
          "devices": [
            "/dev/vdb"
          ]
        },
        {
          "node": {
            "hostnames": {
              "manage": [
                "worker3"
              ],
              "storage": [
                "192.168.40.153"
              ]
            },
            "zone": 1
          },
          "devices": [
            "/dev/vdb"
          ]
        },
                {
          "node": {
            "hostnames": {
              "manage": [
                "worker4"
              ],
              "storage": [
                "192.168.40.154"
              ]
            },
            "zone": 1
          },
          "devices": [
            "/dev/vdb"
          ]
        }
      ]
    }
  ]
}

Thats the cmd i exec on the rancher node where kubectl is setup and working:

./gk-deploy -g --user-key MyUserKey --admin-key MyAdminKey --ssh-keyfile /root/.ssh/id_rsa -l /tmp/heketi_deployment.log -v topology.json

this is the complete script output:

Do you wish to proceed with deployment?

Using Kubernetes CLI.

Checking status of namespace matching 'default': default Active 142m Using namespace "default". Checking glusterd status on 'worker1'. Checking for pre-existing resources... GlusterFS pods ... Checking status of pods matching '--selector=glusterfs=pod':

Timed out waiting for pods matching '--selector=glusterfs=pod'. not found. deploy-heketi pod ... Checking status of pods matching '--selector=deploy-heketi=pod':

Timed out waiting for pods matching '--selector=deploy-heketi=pod'. not found. heketi pod ... Checking status of pods matching '--selector=heketi=pod':

Timed out waiting for pods matching '--selector=heketi=pod'. not found. gluster-s3 pod ... Checking status of pods matching '--selector=glusterfs=s3-pod':

Timed out waiting for pods matching '--selector=glusterfs=s3-pod'. not found. Creating initial resources ... /bin/kubectl -n default create -f /root/gluster-kubernetes/deploy/kube-templates/heketi-service-account.yaml 2>&1 serviceaccount/heketi-service-account created /bin/kubectl -n default create clusterrolebinding heketi-sa-view --clusterrole=edit --serviceaccount=default:heketi-service-account 2>&1 clusterrolebinding.rbac.authorization.k8s.io/heketi-sa-view created /bin/kubectl -n default label --overwrite clusterrolebinding heketi-sa-view glusterfs=heketi-sa-view heketi=sa-view clusterrolebinding.rbac.authorization.k8s.io/heketi-sa-view labeled OK Marking 'worker1' as a GlusterFS node. /bin/kubectl -n default label nodes worker1 storagenode=glusterfs --overwrite 2>&1 node/worker1 not labeled Marking 'worker2' as a GlusterFS node. /bin/kubectl -n default label nodes worker2 storagenode=glusterfs --overwrite 2>&1 node/worker2 not labeled Marking 'worker3' as a GlusterFS node. /bin/kubectl -n default label nodes worker3 storagenode=glusterfs --overwrite 2>&1 node/worker3 not labeled Marking 'worker4' as a GlusterFS node. /bin/kubectl -n default label nodes worker4 storagenode=glusterfs --overwrite 2>&1 node/worker4 not labeled Deploying GlusterFS pods. sed -e 's/storagenode\: glusterfs/storagenode\: 'glusterfs'/g' /root/gluster-kubernetes/deploy/kube-templates/glusterfs-daemonset.yaml | /bin/kubectl -n default create -f - 2>&1 daemonset.extensions/glusterfs created Waiting for GlusterFS pods to start ... Checking status of pods matching '--selector=glusterfs=pod': glusterfs-cnjm5 0/1 Running 0 5m11s glusterfs-nfs6z 0/1 Running 0 5m11s glusterfs-rvtrf 0/1 Running 0 5m11s glusterfs-sw2bd 0/1 Running 0 5m11s Timed out waiting for pods matching '--selector=glusterfs=pod'. pods not found.

ghost commented 4 years ago

if i remove the glusterfs-server package i get Can't access glusterd on 'worker1'

ghost commented 4 years ago

If i take a look at the glusterd.log i always get the message that the port is already in use. That again makes sense if glusterd service is running on the worker node itself outside of k8s. The point is that im not able to use the deployment script if glusterd service is down, this results in the following error:

Checking status of namespace matching 'default': default Active 6m37s Using namespace "default". Checking glusterd status on 'worker1'. Can't access glusterd on 'worker1'

If glusterd is on i get this at the glusterd.log:

2020-01-24 00:41:20.188487] E [socket.c:976:socket_server_bind] 0-socket.management: binding to failed: Address already in use [2020-01-24 00:41:20.188499] E [socket.c:978:socket_server_bind] 0-socket.management: Port is already in use [2020-01-24 00:41:21.188619] E [socket.c:976:socket_server_bind] 0-socket.management: binding to failed: Address already in use [2020-01-24 00:41:21.188660] E [socket.c:978:socket_server_bind] 0-socket.management: Port is already in use [2020-01-24 00:41:22.188774] E [socket.c:976:socket_server_bind] 0-socket.management: binding to failed: Address already in use [2020-01-24 00:41:22.188804] E [socket.c:978:socket_server_bind] 0-socket.management: Port is already in use [2020-01-24 00:41:23.188923] E [socket.c:976:socket_server_bind] 0-socket.management: binding to failed: Address already in use [2020-01-24 00:41:23.188953] E [socket.c:978:socket_server_bind] 0-socket.management: Port is already in use [2020-01-24 00:41:24.189076] E [socket.c:976:socket_server_bind] 0-socket.management: binding to failed: Address already in use [2020-01-24 00:41:24.189115] E [socket.c:978:socket_server_bind] 0-socket.management: Port is already in use [2020-01-24 00:41:25.189235] E [socket.c:976:socket_server_bind] 0-socket.management: binding to failed: Address already in use [2020-01-24 00:41:25.189268] E [socket.c:978:socket_server_bind] 0-socket.management: Port is already in use [2020-01-24 00:41:26.189376] E [socket.c:976:socket_server_bind] 0-socket.management: binding to failed: Address already in use [2020-01-24 00:41:26.189413] E [socket.c:978:socket_server_bind] 0-socket.management: Port is already in use [2020-01-24 00:41:27.189523] E [socket.c:976:socket_server_bind] 0-socket.management: binding to failed: Address already in use [2020-01-24 00:41:27.189585] E [socket.c:978:socket_server_bind] 0-socket.management: Port is already in use [2020-01-24 00:41:28.189720] E [socket.c:976:socket_server_bind] 0-socket.management: binding to failed: Address already in use [2020-01-24 00:41:28.189751] E [socket.c:978:socket_server_bind] 0-socket.management: Port is already in use [2020-01-24 00:41:29.189856] E [socket.c:976:socket_server_bind] 0-socket.management: binding to failed: Address already in use [2020-01-24 00:41:29.189889] E [socket.c:978:socket_server_bind] 0-socket.management: Port is already in use

Any idea?

minhnnhat commented 4 years ago

hi @venomone, did you solve this, i met the same problem when creating glusterfs pod. Thank

ghost commented 4 years ago

The simple answer is no... i tried many Scenarios but none of them worked as expected on ubuntu. Kindly asking to check and fix this up.

minhnnhat commented 4 years ago

The simple answer is no... i tried many Scenarios but none of them worked as expected on ubuntu. Kindly asking to check and fix this up.

Thank u, I moved to rook. BTW, I found a new project that alternative for heketi, you can try on this http://github.com/kadalu/kadalu

ghettosamson commented 4 years ago

I'm running into this same issue on Ubuntu 18.04. Has anyone looked into this? I attempted to use Kadalu but it doesn't serve my needs as I need the ability to define StatefulSets and have the storage service create new PVC for each pod without having to predefine or precreate the PVC's.