Open zhangdaolong opened 6 years ago
what do you see in ceph-mon pod logs? kubectl logs -n ceph ceph-mon-xxxx
can you check why ceph-mon service has no cluster ip?
==> v1/Service NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE ceph-mon ClusterIP None 6789/TCP 1s
check the /etc/resolv.conf of the pod that couldn't resolve ceph-mon, make sure that nameserver x.x.x.x exists, where x.x.x.x is the IP of kube-dns, you can also check that IP by by using "kubectl get svc -n kube-system", make sure it matches, also make sure that kube-dns pod is running.
I seem to have the same or a similar problem. From none of the pods I can resolve the service name, like in this case from inside of ceph-mon:
~# kubectl exec -n ceph -ti ceph-mon-cqwzq -c ceph-mon -- ceph -s
server name not found: ceph-mon.ceph.svc.cluster.local (Temporary failure in name resolution)
unable to parse addrs in 'ceph-mon.ceph.svc.cluster.local'
InvalidArgumentError does not take keyword arguments
command terminated with exit code 1
I guess this is the reason, why my OSDs are failing during init:
kubectl -n ceph logs ceph-osd-dev-sda-xtrm6 -c osd-prepare-pod
+ export LC_ALL=C
+ LC_ALL=C
+ source variables_entrypoint.sh
++ ALL_SCENARIOS='osd osd_directory osd_directory_single osd_ceph_disk osd_ceph_disk_prepare osd_ceph_disk_activate osd_ceph_activate_journal mgr'
++ : ceph
++ : ceph-config/ceph
++ :
++ : osd_ceph_disk_prepare
++ : 1
++ : hive-02
++ : hive-02
++ : /etc/ceph/monmap-ceph
++ : /var/lib/ceph/mon/ceph-hive-02
++ : 0
++ : 0
++ : mds-hive-02
++ : 0
++ : 100
++ : 0
++ : 0
+++ uuidgen
++ : eaddd16b-3a95-4f4c-ba8a-161be9306f42
+++ uuidgen
++ : 5c1c63f8-caa9-4c79-8158-df86aa87df4b
++ : root=default host=hive-02
++ : 0
++ : cephfs
++ : cephfs_data
++ : 8
++ : cephfs_metadata
++ : 8
++ : hive-02
++ :
++ :
++ : 8080
++ : 0
++ : 9000
++ : 0.0.0.0
++ : cephnfs
++ : hive-02
++ : 0.0.0.0
++ CLI_OPTS='--cluster ceph'
++ DAEMON_OPTS='--cluster ceph --setuser ceph --setgroup ceph -d'
++ MOUNT_OPTS='-t xfs -o noatime,inode64'
++ MDS_KEYRING=/var/lib/ceph/mds/ceph-mds-hive-02/keyring
++ ADMIN_KEYRING=/etc/ceph/ceph.client.admin.keyring
++ MON_KEYRING=/etc/ceph/ceph.mon.keyring
++ RGW_KEYRING=/var/lib/ceph/radosgw/hive-02/keyring
++ MGR_KEYRING=/var/lib/ceph/mgr/ceph-hive-02/keyring
++ MDS_BOOTSTRAP_KEYRING=/var/lib/ceph/bootstrap-mds/ceph.keyring
++ RGW_BOOTSTRAP_KEYRING=/var/lib/ceph/bootstrap-rgw/ceph.keyring
++ OSD_BOOTSTRAP_KEYRING=/var/lib/ceph/bootstrap-osd/ceph.keyring
++ OSD_PATH_BASE=/var/lib/ceph/osd/ceph
+ source common_functions.sh
++ set -ex
+ is_available rpm
+ command -v rpm
+ is_available dpkg
+ command -v dpkg
+ OS_VENDOR=ubuntu
+ source /etc/default/ceph
++ TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728
+ case "$CEPH_DAEMON" in
+ OSD_TYPE=prepare
+ start_osd
+ [[ ! -e /etc/ceph/ceph.conf ]]
+ '[' 1 -eq 1 ']'
+ [[ ! -e /etc/ceph/ceph.client.admin.keyring ]]
+ case "$OSD_TYPE" in
+ source osd_disk_prepare.sh
++ set -ex
+ osd_disk_prepare
+ [[ -z /dev/sda ]]
+ [[ ! -e /dev/sda ]]
+ '[' '!' -e /var/lib/ceph/bootstrap-osd/ceph.keyring ']'
+ timeout 10 ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring health
+ exit 1
@feresberbeche Here's my resolv.conf of ceph-mon. The nameserver fits the kube-dns ip:
nameserver 10.96.0.10
search ceph.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
@rootfs ceph-mon service also has no cluster ip in my case. How can I check, why this is the case? At least here is the description of the service:
~# kubectl describe service ceph-mon -n ceph
Name: ceph-mon
Namespace: ceph
Labels: <none>
Annotations: service.alpha.kubernetes.io/tolerate-unready-endpoints=true
Selector: application=ceph,component=mon,release_group=ceph
Type: ClusterIP
IP: None
Port: <unset> 6789/TCP
TargetPort: 6789/TCP
Endpoints: <redacted public ip>:6789,<redacted public ip>:6789,<redacted public ip>:6789
Session Affinity: None
Events: <none>
The log of the ceph-mon pod looks fine to me. I uploaded it here: https://gist.github.com/Silberschleier/1baad5d4853c48abeff3b1326b5cc7db
the same problem happaned, everything looks OK ,but I can't ping dns IP and other service
Hackish solution: On the nodes running & mounting ceph add ceph-mon-discovery.ceph.svc.cluster.local
to /etc/hosts
e.g.
kubectl describe service ceph-mon -n ceph
Name: ceph-mon
Namespace: ceph
Labels: <none>
Annotations: service.alpha.kubernetes.io/tolerate-unready-endpoints: true
Selector: application=ceph,component=mon,release_group=ceph
Type: ClusterIP
IP: None
Port: <unset> 6789/TCP
TargetPort: 6789/TCP
Endpoints: 172.20.1.60:6789
Session Affinity: None
Events: <none>
echo '172.20.1.60 ceph-mon.ceph.svc.cluster.local' >> /etc/hosts
A better way would be to add kube-dns to the hosts name resolution ( kubectl -n kube-system get svc/kube-dns
).
Hi,
yours "hacky solution" saved my life for the moment. After some digging I discovered that there is NO IP on service:
kind: Service
apiVersion: v1
metadata:
name: {{ tuple "ceph_mon" "internal" . | include "helm-toolkit.endpoints.hostname_short_endpoint_lookup" }}
spec:
ports:
- port: {{ tuple "ceph_mon" "internal" "mon" $envAll | include "helm-toolkit.endpoints.endpoint_port_lookup" }}
protocol: TCP
targetPort: {{ tuple "ceph_mon" "internal" "mon" $envAll | include "helm-toolkit.endpoints.endpoint_port_lookup" }}
selector:
{{ tuple $envAll "ceph" "mon" | include "helm-toolkit.snippets.kubernetes_metadata_labels" | indent 4 }}
clusterIP: None
{{- end }}
So another "hacky" solution is to delete line:
clusterIP: None
Than, you can ping/nslookup it with name: ceph-mon.ceph.svc.clusterl.local. - This is tested on lab.
UPDATE
This can be marked as solved I suppose. I just happen to solve it.
Use another network plugin. Instead of weave i used calico with modification of resolv.conf of nodes:
nameserver 10.233.0.3 nameserver 8.8.8.8
And it magicaly started working.
Is this a request for help?: yes
Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT
the container alawys show "1 dns.go:555] Could not find endpoints for service "ceph-mon" in namespace "ceph". DNS records will be created once endpoints show up. " in pod kube-dns-85bc874cc5-mdzhb "
[root@master ceph]# helm install --name=ceph local/ceph --namespace=ceph NAME: ceph LAST DEPLOYED: Tue Jun 12 09:53:41 2018 NAMESPACE: ceph STATUS: DEPLOYED
RESOURCES: ==> v1beta1/DaemonSet NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE ceph-mon 1 1 0 1 0 ceph-mon=enabled 1s ceph-osd-dev-sda 1 1 0 1 0 ceph-osd-device-dev-sda=enabled,ceph-osd=enabled 1s
==> v1beta1/Deployment NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE ceph-mds 1 1 1 0 1s ceph-mgr 1 1 1 0 1s ceph-mon-check 1 1 1 0 1s ceph-rbd-provisioner 2 2 2 0 1s ceph-rgw 1 1 1 0 1s
==> v1/Job NAME DESIRED SUCCESSFUL AGE ceph-mon-keyring-generator 1 0 1s ceph-mds-keyring-generator 1 0 1s ceph-osd-keyring-generator 1 0 1s ceph-mgr-keyring-generator 1 0 1s ceph-rgw-keyring-generator 1 0 1s ceph-namespace-client-key-generator 1 0 1s ceph-storage-keys-generator 1 0 1s
==> v1/Pod(related) NAME READY STATUS RESTARTS AGE ceph-mon-rsjkn 0/3 Init:0/2 0 1s ceph-osd-dev-sda-jb8s7 0/1 Init:0/3 0 1s ceph-mds-696bd98bdb-92tj2 0/1 Init:0/2 0 1s ceph-mgr-56f45bb99c-pmpfm 0/1 Pending 0 1s ceph-mon-check-74d98c5b95-k5xc5 0/1 Pending 0 1s ceph-rbd-provisioner-b58659dc9-llllj 0/1 Pending 0 1s ceph-rbd-provisioner-b58659dc9-rh4zd 0/1 ContainerCreating 0 1s ceph-rgw-5bd9dd66c5-q5vzp 0/1 Pending 0 1s ceph-mon-keyring-generator-nzg2l 0/1 Pending 0 1s ceph-mds-keyring-generator-cr8ql 0/1 Pending 0 1s ceph-osd-keyring-generator-z5jrq 0/1 Pending 0 1s ceph-mgr-keyring-generator-kw2wj 0/1 Pending 0 1s ceph-rgw-keyring-generator-6kghm 0/1 Pending 0 1s ceph-namespace-client-key-generator-dk968 0/1 Pending 0 1s ceph-storage-keys-generator-4mhhk 0/1 Pending 0 1s
==> v1/Secret NAME TYPE DATA AGE ceph-keystone-user-rgw Opaque 7 1s
==> v1/ConfigMap NAME DATA AGE ceph-bin-clients 2 1s ceph-bin 26 1s ceph-etc 1 1s ceph-templates 5 1s
==> v1/StorageClass NAME PROVISIONER AGE ceph-rbd ceph.com/rbd 1s
==> v1/Service NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE ceph-mon ClusterIP None 6789/TCP 1s
ceph-rgw ClusterIP 10.109.46.173 8088/TCP 1s
[root@master ceph]# kubectl exec kube-dns-85bc874cc5-mdzhb -ti -n kube-system -c kubedns -- sh / # ps PID USER TIME COMMAND 1 root 3:19 /kube-dns --domain=172.16.34.88. --dns-port=10053 --config-dir=/kube-dns-config --v=2 26 root 0:35 ping ceph-mon.ceph.svc.cluster.local 157 root 0:00 sh 161 root 0:00 sh 165 root 0:00 sh / # / # ping ceph-mon.ceph.svc.cluster.local ping: bad address 'ceph-mon.ceph.svc.cluster.local' / #
[root@master ceph]# kubectl get pod --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE ceph ceph-mds-696bd98bdb-rvq42 0/1 CrashLoopBackOff 6 11m ceph ceph-mds-keyring-generator-6nrct 0/1 Completed 0 11m ceph ceph-mgr-56f45bb99c-smqqj 0/1 CrashLoopBackOff 6 11m ceph ceph-mgr-keyring-generator-kdjd4 0/1 Completed 0 11m ceph ceph-mon-check-74d98c5b95-nqhmg 1/1 Running 0 11m ceph ceph-mon-keyring-generator-7xmd8 0/1 Completed 0 11m ceph ceph-mon-m72hp 3/3 Running 0 11m ceph ceph-namespace-client-key-generator-cvnpw 0/1 Completed 0 11m ceph ceph-osd-dev-sda-kzn65 0/1 Init:CrashLoopBackOff 6 11m ceph ceph-osd-keyring-generator-48gb6 0/1 Completed 0 11m ceph ceph-rbd-provisioner-b58659dc9-7jsnk 1/1 Running 0 11m ceph ceph-rbd-provisioner-b58659dc9-sf6hr 1/1 Running 0 11m ceph ceph-rgw-5bd9dd66c5-n25bn 0/1 CrashLoopBackOff 6 11m ceph ceph-rgw-keyring-generator-vs8th 0/1 Completed 0 11m ceph ceph-storage-keys-generator-ww7hn 0/1 Completed 0 11m default busybox 1/1 Running 113 4d kube-system etcd-master 1/1 Running 8 24d kube-system heapster-69b5d4974d-9g96p 1/1 Running 10 24d kube-system kube-apiserver-master 1/1 Running 8 24d kube-system kube-controller-manager-master 1/1 Running 8 24d kube-system kube-dns-85bc874cc5-mdzhb 3/3 Running 27 24d kube-system kube-flannel-ds-b94c4 1/1 Running 12 24d kube-system kube-flannel-ds-sqzwv 1/1 Running 10 24d kube-system kube-proxy-9j6sq 1/1 Running 10 24d kube-system kube-proxy-znkxj 1/1 Running 7 24d kube-system kube-scheduler-master 1/1 Running 8 24d kube-system kubernetes-dashboard-7d5dcdb6d9-c2sz6 1/1 Running 10 24d kube-system monitoring-grafana-69df66f668-fpgn5 1/1 Running 10 24d kube-system monitoring-influxdb-78d4c6f5b6-hnjg2 1/1 Running 50 24d kube-system tiller-deploy-f9b8476d-trtml 1/1 Running 0 4d
Version of Helm and Kubernetes:
[root@master ceph]# helm version Client: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"} Server: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"}
[root@master ceph]# kubectl version Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.2", GitCommit:"81753b10df112992bf51bbc2c2f85208aad78335", GitTreeState:"clean", BuildDate:"2018-04-27T09:22:21Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.2", GitCommit:"81753b10df112992bf51bbc2c2f85208aad78335", GitTreeState:"clean", BuildDate:"2018-04-27T09:10:24Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Which chart:
What happened: DNS pod can not resolve ceph-mon.ceph.svc.cluster.local
What you expected to happen: DNS pod can resolve ceph-mon.ceph.svc.cluster.local
How to reproduce it (as minimally and precisely as possible): always
Anything else we need to know: None