Bio-OS / bioos

Apache License 2.0
57 stars 16 forks source link

bug: Fail to start jupyterhub during deployment #33

Open yuanminhui opened 9 months ago

yuanminhui commented 9 months ago

Describe the bug

Fail to start jupyterhub during deployment.

To Reproduce

Follow the deployment guide at https://bio-os.gitbook.io/userguide/bu-shu/getting-set-up/bu-shu-bioos or https://github.com/Bio-OS/helm-charts/blob/main/README.md.

$ helm install jupyterhub bioos/jupyterhub \
        --namespace bioos \
        --create-namespace \
        --set hub.db.url=mysql+pymysql://root:Bytedance2023@mysql.bioos.svc.cluster.local:3306/bioos \
        --set hub.db.password=Bytedance2023
"bioos" has been added to your repositories
NAME: jupyterhub
LAST DEPLOYED: Sun Nov 19 10:37:46 2023
NAMESPACE: bioos
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
.      __                          __                  __  __          __
      / / __  __  ____    __  __  / /_  ___    _____  / / / / __  __  / /_
 __  / / / / / / / __ \  / / / / / __/ / _ \  / ___/ / /_/ / / / / / / __ \
/ /_/ / / /_/ / / /_/ / / /_/ / / /_  /  __/ / /    / __  / / /_/ / / /_/ /
\____/  \__,_/ / .___/  \__, /  \__/  \___/ /_/    /_/ /_/  \__,_/ /_.___/
              /_/      /____/

       You have successfully installed the official JupyterHub Helm chart!

### Installation info

  - Kubernetes namespace: bioos
  - Helm release name:    jupyterhub
  - Helm chart version:   2.0.0
  - JupyterHub version:   3.0.0
  - Hub pod packages:     See https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/2.0.0/images/hub/requirements.txt

Nothing seems wrong here, and:

$ kubectl -n bioos port-forward --address 0.0.0.0 service/hub 8081:8081
error: unable to forward port because pod is not running. Current status=Pending

The jupyterhub pod is not running. A long way debug started.


$ kubectl -n bioos get pods -o wide
NAME                   READY   STATUS             RESTARTS     AGE   IP           NODE           NOMINATED NODE   READINESS GATES
hub-5f57d5bd65-wlw6r   0/1     CrashLoopBackOff   6 (4m ago)   10m   10.244.1.3   minikube-m02   <none>           <none>
mysql-0                1/1     Running            0            47m   10.244.3.3   minikube-m04   <none>           <none>

$ kubectl -n bioos get svc -o wide
NAME                             TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE   SELECTOR
hub                              NodePort       10.255.159.107   <none>        8081:32450/TCP   34m   app=jupyterhub,component=hub,release=jupyterhub
jupyter--2fjupyterhub-2f-route   ExternalName   <none>           hub.bioos     8081/TCP         67m   <none>
mysql                            ClusterIP      10.255.80.216    <none>        3306/TCP         72m   app.kubernetes.io/component=primary,app.kubernetes.io/instance=mysql,app.kubernetes.io/name=mysql
mysql-headless                   ClusterIP      None             <none>        3306/TCP         72m   app.kubernetes.io/component=primary,app.kubernetes.io/instance=mysql,app.kubernetes.io/name=mysql

$ kubectl -n kube-system get pods,svc
NAME                                       READY   STATUS    RESTARTS         AGE
pod/coredns-7f74c56694-z87cm               1/1     Running   6 (3h40m ago)    25h
pod/csi-nfs-controller-74f4f8484-jmqnw     4/4     Running   0                3h25m
pod/csi-nfs-node-h97cs                     3/3     Running   0                3h25m
pod/csi-nfs-node-kg7j8                     3/3     Running   0                3h25m
pod/csi-nfs-node-rwbs4                     3/3     Running   0                3h25m
pod/csi-nfs-node-z4d89                     3/3     Running   0                3h25m
pod/etcd-minikube                          1/1     Running   2 (3h40m ago)    25h
pod/kube-apiserver-minikube                1/1     Running   2 (3h40m ago)    25h
pod/kube-controller-manager-minikube       1/1     Running   2 (3h40m ago)    25h
pod/kube-proxy-8dl6v                       1/1     Running   2 (3h33m ago)    25h
pod/kube-proxy-cx85g                       1/1     Running   2 (3h33m ago)    25h
pod/kube-proxy-nbvf4                       1/1     Running   2 (3h33m ago)    25h
pod/kube-proxy-sqcwc                       1/1     Running   2 (3h40m ago)    25h
pod/kube-scheduler-minikube                1/1     Running   2 (3h40m ago)    25h
pod/snapshot-controller-66746ffc86-r4w6k   1/1     Running   0                3h25m
pod/storage-provisioner                    1/1     Running   17 (3h40m ago)   25h

NAME               TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)                  AGE
service/kube-dns   ClusterIP   10.255.0.10   <none>        53/UDP,53/TCP,9153/TCP   25h

$ kubectl -n ingress-nginx get pods,svc  -o wide
NAME                                            READY   STATUS              RESTARTS   AGE   IP           NODE       NOMINATED NODE   READINESS GATES
pod/ingress-nginx-admission-create-z48gm        0/1     ImagePullBackOff    0          25h   10.244.0.8   minikube   <none>           <none>
pod/ingress-nginx-admission-patch-2mdgb         0/1     ImagePullBackOff    0          25h   10.244.0.9   minikube   <none>           <none>
pod/ingress-nginx-controller-684c54767f-gwtk9   0/1     ContainerCreating   0          25h   <none>       minikube   <none>           <none>

NAME                                         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE   SELECTOR
service/ingress-nginx-controller             NodePort    10.255.48.183   <none>        80:30710/TCP,443:30659/TCP   25h   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
service/ingress-nginx-controller-admission   ClusterIP   10.255.61.48    <none>        443/TCP                      25h   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx

$ kubectl get pods -n ingress-nginx
NAME                                        READY   STATUS              RESTARTS   AGE
ingress-nginx-admission-create-z48gm        0/1     ImagePullBackOff    0          26h
ingress-nginx-admission-patch-2mdgb         0/1     ErrImagePull        0          26h
ingress-nginx-controller-684c54767f-gwtk9   0/1     ContainerCreating   0          26h

To look deeper into the nginx pod:

$ kubectl describe pod ingress-nginx-admission-create-z48gm -n ingress-nginx
Name:         ingress-nginx-admission-create-z48gm
Namespace:    ingress-nginx
Priority:     0
Node:         minikube/192.168.49.2
Start Time:   Sat, 18 Nov 2023 09:51:46 +0800
Labels:       app.kubernetes.io/component=admission-webhook
              app.kubernetes.io/instance=ingress-nginx
              app.kubernetes.io/name=ingress-nginx
              controller-uid=e2b83263-5092-4c77-8296-b777ed5d9705
              job-name=ingress-nginx-admission-create
Annotations:  <none>
Status:       Pending
IP:           10.244.0.8
IPs:
  IP:           10.244.0.8
Controlled By:  Job/ingress-nginx-admission-create
Containers:
  create:
    Container ID:
    Image:         registry.cn-hangzhou.aliyuncs.com/google_containers/kube-webhook-certgen:v20231011-8b53cabe0@sha256:a7943503b45d552785aa3b5e457f169a5661fb94d82b8a3373bcd9ebaf9aac80
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Args:
      create
      --host=ingress-nginx-controller-admission,ingress-nginx-controller-admission.$(POD_NAMESPACE).svc
      --namespace=$(POD_NAMESPACE)
      --secret-name=ingress-nginx-admission
    State:          Waiting
      Reason:       ImagePullBackOff
    Ready:          False
    Restart Count:  0
    Environment:
      POD_NAMESPACE:  ingress-nginx (v1:metadata.namespace)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-wgpk6 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  kube-api-access-wgpk6:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              kubernetes.io/os=linux
                             minikube.k8s.io/primary=true
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason   Age                    From     Message
  ----    ------   ----                   ----     -------
  Normal  BackOff  4s (x1095 over 4h10m)  kubelet  Back-off pulling image "registry.cn-hangzhou.aliyuncs.com/google_containers/kube-webhook-certgen:v20231011-8b53cabe0@sha256:a7943503b45d552785aa3b5e457f169a5661fb94d82b8a3373bcd9ebaf9aac80"

It seems the image is not pulled successfully. I tried to pull manully to validate:

$ docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-webhook-certgen:v20231011-8b53cabe0@sha256:a7943503b45d552785aa3b5e457f169a5661fb94d82b8a3373bcd9ebaf9aac80
Error response from daemon: manifest for registry.cn-hangzhou.aliyuncs.com/google_containers/kube-webhook-certgen@sha256:a7943503b45d552785aa3b5e457f169a5661fb94d82b8a3373bcd9ebaf9aac80 not found: manifest unknown: manifest unknown

Manifest of the image signature failed. When I removed sha256 string:

$ docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-webhook-certgen:v20231011-8b53cabe0
v20231011-8b53cabe0: Pulling from google_containers/kube-webhook-certgen
07a64a71e011: Pulling fs layer
fe5ca62666f0: Pulling fs layer
b02a7525f878: Pulling fs layer
fcb6f6d2c998: Waiting
e8c73c638ae9: Waiting
1e3d9b7d1452: Waiting
4aa0ea1413d3: Waiting
7c881f9ab25e: Waiting
5627a970d25e: Waiting
2c4dd5b46232: Waiting

It works.

So the problem lies in image designation. A procedure is needed for manully recover the deployments. I will provide the fixed procedure in a PR so others can install this program successfully. Please check and merge it.

Expected behavior

Successful deployment of jupyterhub & bioos.

Screenshots

None.