ceph / ceph-helm

Curated applications for Kubernetes
Apache License 2.0
108 stars 36 forks source link

Ceph helm not running #68

Closed ygqygq2 closed 5 years ago

ygqygq2 commented 6 years ago

My env: CentOS7, kubernetes1.11.1

ceph-overrides.yaml

network:
  public: 192.168.105.0/24
  cluster: 192.168.105.0/24

osd_devices:
  - name: dev-sdb
    device: /dev/sdb
    zap: "1"

storageclass:
  name: ceph-rbd
  pool: rbd
  #user_id: admin
  user_id: k8s

ceph/values.yaml I modified

deployment:
  ceph: true
  storage_secrets: true
  client_secrets: true
  rbd_provisioner: true
  rgw_keystone_user_and_endpoints: false

images:
  ks_user: docker.io/kolla/centos-source-heat-engine:3.0.3
  ks_service: docker.io/kolla/centos-source-heat-engine:3.0.3
  ks_endpoints: docker.io/kolla/centos-source-heat-engine:3.0.3
  bootstrap: docker.io/ceph/daemon:v3.0.5-stable-3.0-luminous-centos-7
  dep_check: docker.io/kolla/centos-source-kubernetes-entrypoint:4.0.0
  daemon: docker.io/ceph/daemon:v3.0.5-stable-3.0-luminous-centos-7
  ceph_config_helper: docker.io/port/ceph-config-helper:v1.7.5
  rbd_provisioner: quay.io/external_storage/rbd-provisioner:v0.1.1
  minimal: docker.io/alpine:latest
  pull_policy: "IfNotPresent"

kubectl get pod -n ceph

NAME                                        READY     STATUS                  RESTARTS   AGE
ceph-mds-c5c856bb8-rw2vq                    0/1       Pending                 0          13m
ceph-mds-keyring-generator-llhcl            0/1       Completed               0          13m
ceph-mgr-566969ff9f-bhnsz                   0/1       CrashLoopBackOff        6          7m
ceph-mgr-keyring-generator-gplx2            0/1       Completed               0          13m
ceph-mon-check-9fd5797bc-nb5l6              1/1       Running                 0          11m
ceph-mon-fpd6w                              3/3       Running                 0          13m
ceph-mon-keyring-generator-kvsgc            0/1       Completed               0          13m
ceph-namespace-client-key-generator-fg9nv   0/1       Completed               0          7m
ceph-osd-dev-sdb-4qnd9                      0/1       Init:CrashLoopBackOff   6          13m
ceph-osd-dev-sdb-glk52                      0/1       Init:CrashLoopBackOff   6          13m
ceph-osd-keyring-generator-9ztc7            0/1       Completed               0          13m
ceph-rbd-provisioner-5bc57f5f64-pmnr6       1/1       Running                 0          13m
ceph-rbd-provisioner-5bc57f5f64-sllbc       1/1       Running                 0          13m
ceph-rgw-597dcb57f7-9nzrz                   0/1       Pending                 0          13m
ceph-rgw-keyring-generator-5j844            0/1       Completed               0          13m
ceph-storage-keys-generator-t22q7           0/1       Completed               0          13m

kubectl describe pod/ceph-mon-fpd6w -n ceph

Events:
  Type     Reason       Age                From           Message
  ----     ------       ----               ----           -------
  Warning  FailedMount  16m (x5 over 16m)  kubelet, lab1  MountVolume.SetUp failed for volume "ceph-mon-keyring" : secrets "ceph-mon-keyring" not found
  Warning  FailedMount  16m (x5 over 16m)  kubelet, lab1  MountVolume.SetUp failed for volume "ceph-bootstrap-mds-keyring" : secrets "ceph-bootstrap-mds-keyring" not found
  Warning  FailedMount  16m (x5 over 16m)  kubelet, lab1  MountVolume.SetUp failed for volume "ceph-bootstrap-rgw-keyring" : secrets "ceph-bootstrap-rgw-keyring" not found
  Warning  FailedMount  16m (x5 over 16m)  kubelet, lab1  MountVolume.SetUp failed for volume "ceph-bootstrap-osd-keyring" : secrets "ceph-bootstrap-osd-keyring" not found
  Warning  FailedMount  16m (x5 over 16m)  kubelet, lab1  MountVolume.SetUp failed for volume "ceph-client-admin-keyring" : secrets "ceph-client-admin-keyring" not found

But, I can get the secet.

# kubectl get secret -n ceph
NAME                                  TYPE                                  DATA      AGE
ceph-bootstrap-mds-keyring            Opaque                                1         16m
ceph-bootstrap-mgr-keyring            Opaque                                1         16m
ceph-bootstrap-osd-keyring            Opaque                                1         16m
ceph-bootstrap-rgw-keyring            Opaque                                1         16m
ceph-client-admin-keyring             Opaque                                1         16m
ceph-keystone-user-rgw                Opaque                                7         16m
ceph-mon-keyring                      Opaque                                1         16m
default-token-htx2q                   kubernetes.io/service-account-token   3         16m
pvc-ceph-client-key                   kubernetes.io/rbd                     1         10m
pvc-ceph-conf-combined-storageclass   kubernetes.io/rbd                     1         16m

kubectl logs -f ceph-mon-check-9fd5797bc-nb5l6 -n ceph

+ echo '2018-08-16 03:43:15  /watch_mon_health.sh: sleep 30 sec'
+ return 0
+ sleep 30
+ '[' true ']'
+ log 'checking for zombie mons'
+ '[' -z 'checking for zombie mons' ']'
++ date '+%F %T'
2018-08-16 03:43:45  /watch_mon_health.sh: checking for zombie mons
+ TIMESTAMP='2018-08-16 03:43:45'
+ echo '2018-08-16 03:43:45  /watch_mon_health.sh: checking for zombie mons'
+ return 0
+ CLUSTER=ceph
+ /check_zombie_mons.py
2018-08-16 03:43:46.122705 7fb2f3994700  0 librados: client.admin authentication error (1) Operation not permitted
[errno 1] error connecting to the cluster
Traceback (most recent call last):
  File "/check_zombie_mons.py", line 30, in <module>
    current_mons = extract_mons_from_monmap()
  File "/check_zombie_mons.py", line 18, in extract_mons_from_monmap
    monmap = subprocess.check_output(monmap_command, shell=True)
  File "/usr/lib64/python2.7/subprocess.py", line 575, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command 'ceph --cluster=${CLUSTER} mon getmap > /tmp/monmap && monmaptool -f /tmp/monmap --print' returned non-zero exit status 1
frankruizhi commented 6 years ago

I encountered the same issue

alvindaiyan commented 5 years ago

I believe this is a bug and the problem is caused by your previous install. I encountered the same problem after I deleted the previous install. You have to delete all the files by rm -rf /var/lib/ceph-helm/ceph/mon then restart by run helm install ...

miahwk commented 5 years ago

@alvindaiyan still not working..

medzeus2 commented 5 years ago

same issue. Any idea ?

alvindaiyan commented 5 years ago

@miahwk still same error? I stopped the helm-ceph and deleted all the cache of ceph then reinstall ceph-helm. Try show us more useful error log.

dsielert commented 5 years ago

I get this same problem.. there is no /var/lib/ceph-helm to delete this chart does not work