apache-spark-on-k8s / kubernetes-HDFS

Repository holding configuration files for running an HDFS cluster in Kubernetes
Apache License 2.0
397 stars 185 forks source link

Pods, Stateful sets and Daemon sets not running #72

Open SimoneStarace opened 5 years ago

SimoneStarace commented 5 years ago

Introduction

I was simply trying to run all the charts, like how is explained in the readme file. but every time I try to do it I always get some errors.

Tools

Before I show what errors I get, I want to let you all know what tools I'm using:

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-19T16:40:16Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-19T16:32:14Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
$ minikube version
minikube version: v1.2.0

I have everything installed on my computer which runs Ubuntu 18.04.2 LTS on an HDD with a memory of more than 500GB.

Execution

In this section there are the commands I execute for run the charts.

First I create a new virtual machine with minikube.

$ minikube start
πŸ˜„  minikube v1.2.0 on linux (amd64)
πŸ”₯  Creating virtualbox VM (CPUs=4, Memory=4096MB, Disk=375000MB) ...
🐳  Configuring environment for Kubernetes v1.15.0 on Docker 18.09.6
🚜  Pulling images ...
πŸš€  Launching Kubernetes ... 
βŒ›  Verifying: apiserver proxy etcd scheduler controller dns
πŸ„  Done! kubectl is now configured to use "Name of the profile"

I didn't include the first 2 helm commands, exaplined in the readme file, because those both didn't give me any errors.

When I execute this command it shows the first error.

$ helm install -n my-hdfs charts/hdfs-k8s
Error: could not find tiller

To solve this problem I had to run this command and wait for few minutes before I can run the previous command again. $ helm init

Now here is where I'm always stuck. Everytime there are some elements that won't run.

$ kubectl get pod -l release=my-hdfs
NAME                              READY   STATUS             RESTARTS   AGE
my-hdfs-client-544d894fc7-gp4zl   1/1     Running            0          15m
my-hdfs-datanode-b9wbx            0/1     Running            5          15m
my-hdfs-journalnode-0             1/1     Running            0          15m
my-hdfs-journalnode-1             0/1     Pending            0          5m8s
my-hdfs-namenode-0                0/1     CrashLoopBackOff   5          15m
my-hdfs-namenode-1                0/1     Pending            0          5m12s
my-hdfs-zookeeper-0               1/1     Running            0          15m
my-hdfs-zookeeper-1               1/1     Running            0          2m19s
my-hdfs-zookeeper-2               1/1     Running            0          108s

$ kubectl get statefulset -l release=my-hdfs
NAME                  READY   AGE
my-hdfs-journalnode   1/3     49m
my-hdfs-namenode      0/2     49m
my-hdfs-zookeeper     3/3     49m

$ kubectl get daemonset -l release=my-hdfs
NAME               DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
my-hdfs-datanode   1         1         0       1            0           <none>          49m

Here are the links where you can see the errors I get Daemon Set errors Pods errors Stateful sets errors

Guessing one error

I think that one of those errors is given because it isn't specified the storage class in the stateful sets.

$ kubectl describe statefulsets my-hdfs-namenode
Name:               my-hdfs-namenode
Namespace:          default
CreationTimestamp:  Thu, 27 Jun 2019 11:38:19 +0200
Selector:           app=hdfs-namenode,release=my-hdfs
Labels:             app=hdfs-namenode
                    chart=hdfs-namenode-k8s-0.1.0
                    release=my-hdfs
Annotations:        <none>
Replicas:           2 desired | 2 total
Update Strategy:    OnDelete
Pods Status:        1 Running / 1 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:  app=hdfs-namenode
           release=my-hdfs
  Containers:
   hdfs-namenode:
    Image:       uhopper/hadoop-namenode:2.7.2
    Ports:       8020/TCP, 50070/TCP
    Host Ports:  8020/TCP, 50070/TCP
    Command:
      /bin/sh
      -c
    Args:
      /entrypoint.sh "/nn-scripts/format-and-run.sh"
    Environment:
      HADOOP_CUSTOM_CONF_DIR:  /etc/hadoop-custom-conf
      MULTIHOMED_NETWORK:      0
      MY_POD:                   (v1:metadata.name)
      NAMENODE_POD_0:          my-hdfs-namenode-0
      NAMENODE_POD_1:          my-hdfs-namenode-1
    Mounts:
      /etc/hadoop-custom-conf from hdfs-config (ro)
      /hadoop/dfs/name from metadatadir (rw,path="name")
      /nn-scripts from nn-scripts (ro)
  Volumes:
   nn-scripts:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      my-hdfs-namenode-scripts
    Optional:  false
   hdfs-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      my-hdfs-config
    Optional:  false
Volume Claims:
  Name:          metadatadir
  StorageClass:  
  Labels:        <none>
  Annotations:   <none>
  Capacity:      100Gi
  Access Modes:  [ReadWriteOnce]
Events:
  Type    Reason            Age   From                    Message
  ----    ------            ----  ----                    -------
  Normal  SuccessfulCreate  59m   statefulset-controller  create Claim metadatadir-my-hdfs-namenode-0 Pod my-hdfs-namenode-0 in StatefulSet my-hdfs-namenode success
  Normal  SuccessfulCreate  59m   statefulset-controller  create Pod my-hdfs-namenode-0 in StatefulSet my-hdfs-namenode successful
  Normal  SuccessfulCreate  49m   statefulset-controller  create Claim metadatadir-my-hdfs-namenode-1 Pod my-hdfs-namenode-1 in StatefulSet my-hdfs-namenode success
  Normal  SuccessfulCreate  49m   statefulset-controller  create Pod my-hdfs-namenode-1 in StatefulSet my-hdfs-namenode successful

How can I solve those errors?

Have a nice day everyone.

SimoneStarace commented 5 years ago

Update 1

So after doing some tests I got a different output but still not every node is running correctly. I though that the pods don't run because in the Stateful set there isn't specified a storage class but that's not the problem. I simply deleted the chart doing this command:

$ helm delete --purge my-hdfs

After this I simply run the installation, again, and this time I got different errors: Pods errors

Those are the description of the namenode 0 and 1

$ kubectl describe pod my-hdfs-namenode-0
Name:           my-hdfs-namenode-0
Namespace:      default
Priority:       0
Node:           minikube/10.0.2.15
Start Time:     Fri, 28 Jun 2019 14:37:13 +0200
Labels:         app=hdfs-namenode
                controller-revision-hash=my-hdfs-namenode-65b74c4cfc
                release=my-hdfs
                statefulset.kubernetes.io/pod-name=my-hdfs-namenode-0
Annotations:    <none>
Status:         Running
IP:             10.0.2.15
Controlled By:  StatefulSet/my-hdfs-namenode
Containers:
  hdfs-namenode:
    Container ID:  docker://7f9f08a34c86333d48eeb6b81bf457fcf25c79956b624c7f1f88ed432473b996
    Image:         uhopper/hadoop-namenode:2.7.2
    Image ID:      docker-pullable://uhopper/hadoop-namenode@sha256:c78c6b3e97a01ce09dd4b0bc23e9885dee9658982c5d358554cad7657be06686
    Ports:         8020/TCP, 50070/TCP
    Host Ports:    8020/TCP, 50070/TCP
    Command:
      /bin/sh
      -c
    Args:
      /entrypoint.sh "/nn-scripts/format-and-run.sh"
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 28 Jun 2019 14:48:07 +0200
      Finished:     Fri, 28 Jun 2019 14:48:09 +0200
    Ready:          False
    Restart Count:  7
    Environment:
      HADOOP_CUSTOM_CONF_DIR:  /etc/hadoop-custom-conf
      MULTIHOMED_NETWORK:      0
      MY_POD:                  my-hdfs-namenode-0 (v1:metadata.name)
      NAMENODE_POD_0:          my-hdfs-namenode-0
      NAMENODE_POD_1:          my-hdfs-namenode-1
    Mounts:
      /etc/hadoop-custom-conf from hdfs-config (ro)
      /hadoop/dfs/name from metadatadir (rw,path="name")
      /nn-scripts from nn-scripts (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-v8fxs (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  metadatadir:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  metadatadir-my-hdfs-namenode-0
    ReadOnly:   false
  nn-scripts:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      my-hdfs-namenode-scripts
    Optional:  false
  hdfs-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      my-hdfs-config
    Optional:  false
  default-token-v8fxs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-v8fxs
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  14m                   default-scheduler  Successfully assigned default/my-hdfs-namenode-0 to minikube
  Normal   Pulled     12m (x5 over 14m)     kubelet, minikube  Container image "uhopper/hadoop-namenode:2.7.2" already present on machine
  Normal   Created    12m (x5 over 14m)     kubelet, minikube  Created container hdfs-namenode
  Normal   Started    12m (x5 over 14m)     kubelet, minikube  Started container hdfs-namenode
  Warning  BackOff    4m16s (x46 over 14m)  kubelet, minikube  Back-off restarting failed container
$ kubectl describe pod my-hdfs-namenode-1
Name:           my-hdfs-namenode-1
Namespace:      default
Priority:       0
Node:           <none>
Labels:         app=hdfs-namenode
                controller-revision-hash=my-hdfs-namenode-65b74c4cfc
                release=my-hdfs
                statefulset.kubernetes.io/pod-name=my-hdfs-namenode-1
Annotations:    <none>
Status:         Pending
IP:             
Controlled By:  StatefulSet/my-hdfs-namenode
Containers:
  hdfs-namenode:
    Image:       uhopper/hadoop-namenode:2.7.2
    Ports:       8020/TCP, 50070/TCP
    Host Ports:  8020/TCP, 50070/TCP
    Command:
      /bin/sh
      -c
    Args:
      /entrypoint.sh "/nn-scripts/format-and-run.sh"
    Environment:
      HADOOP_CUSTOM_CONF_DIR:  /etc/hadoop-custom-conf
      MULTIHOMED_NETWORK:      0
      MY_POD:                  my-hdfs-namenode-1 (v1:metadata.name)
      NAMENODE_POD_0:          my-hdfs-namenode-0
      NAMENODE_POD_1:          my-hdfs-namenode-1
    Mounts:
      /etc/hadoop-custom-conf from hdfs-config (ro)
      /hadoop/dfs/name from metadatadir (rw,path="name")
      /nn-scripts from nn-scripts (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-v8fxs (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  metadatadir:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  metadatadir-my-hdfs-namenode-1
    ReadOnly:   false
  nn-scripts:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      my-hdfs-namenode-scripts
    Optional:  false
  hdfs-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      my-hdfs-config
    Optional:  false
  default-token-v8fxs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-v8fxs
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  80s (x24 over 15m)  default-scheduler  0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports.

I really don' t know how to solve this problem.

Can someone help me about this?

drametoid commented 4 years ago

I'm on Kubernetes v1.15.3 and also facing similar issue.

I am doing the same setup as mentioned in the readme, except I change the space taken by namenodes and few others to a lesser value.

This here is the log for my namenode-0 pod which is the first one to go into Error state right after all the pods go into Running state.

Any solutions so far?

SimoneStarace commented 4 years ago

I'm on Kubernetes v1.15.3 and also facing similar issue.

I am doing the same setup as mentioned in the readme, except I change the space taken by namenodes and few others to a lesser value.

This here is the log for my namenode-0 pod which is the first one to go into Error state right after all the pods go into Running state.

Any solutions so far?

Hi. I'm sorry but I didn't solve this problem and I'm not looking at it because I'm working on a different project right now. I was thinking to close this issue but since I never solved the problem I left it open.

Laziz-data commented 3 years ago

Still facing the same problome here ; any one to help πŸ˜”