ceph / ceph-helm

Curated applications for Kubernetes
Apache License 2.0
108 stars 36 forks source link

Ceph OSD's readiness probe fails. #42

Open saitejar opened 6 years ago

saitejar commented 6 years ago

Is this a request for help?: Yes

Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT

Version of Helm and Kubernetes: Latest Master Branch.

Which chart: Ceph-Helm

What happened: I have deployed ceph helm chart on to a kubernetes cluster with the following overrides. I am running MDS too. `network: public: 10.244.0.0/16 cluster: 10.244.0.0/16

ceph_mgr_modules_config: dashboard: port: 7000

osd_directory: enabled: true

manifests: deployment_rgw: false service_rgw: false daemonset_osd: true

storageclass: name: ceph-rbd pool: rbd user_id: k8s`

All the services run correctly except OSD's. I am using osd_directory: enabled. OSD fail on readiness probe.

Readiness probe failed: dial tcp 10.211.55.186:6800: getsockopt: connection refused Back-off restarting failed container Error syncing pod

What you expected to happen: I expected to get all the osd's running too along with mon, mgr, mds.

How to reproduce it (as minimally and precisely as possible): My cluster setup. 1 Master node 2 Worker nodes

  1. Use the following overrides file.

`network: public: 10.244.0.0/16 cluster: 10.244.0.0/16

ceph_mgr_modules_config: dashboard: port: 7000

osd_directory: enabled: true

manifests: deployment_rgw: false service_rgw: false daemonset_osd: true

storageclass: name: ceph-rbd pool: rbd user_id: k8s`

  1. Please replace public/cluster network to whatever is applicable in your kubernetes cluster.

  2. Add ceph-mon=enabled,ceph-mds=enabled,ceph-mgr=enabled to master node. Add ceph-osd=enabled to two worker nodes.

Follow the deploy instructions in http://docs.ceph.com/docs/master/start/kube-helm/ with the above changes to it.

Anything else we need to know:

alram commented 6 years ago

I don't think we extensively tested OSD directories. Can you provide the logs for the Pod that fails?

Aside: The probe for ceph-osd is somewhat flimsy in that it checks for port 6800. This is fixed in opentack-helm (https://review.openstack.org/#/c/457754/) but we're waiting for the patchset to be merged before backporting the fixes.

saitejar commented 6 years ago

@alram It is the OSD pod that fails readiness check. Here is the log for it. osd-log.txt

Based on the logs, do you think just disabling the readiness/liveness checks could be a quick fix for now ?

alram commented 6 years ago

Sadly, no. The readiness/liveness probe fail because the OSD isn't able to start. I'll try to reproduce with some debugging, more likely tomorrow than today

alram commented 6 years ago

Actually I misread the logs. The OSD does get started, I got confused by the FileJournal::_open message. If you list your Pods do you see the OSD Pod as 'running'? If so you shouldn't worry too much about the readiness probe failing only once. It's because it executes as soon as the OSD Pod gets started when it should have a delay since the OSD daemon may not be fully started yet. It doesn't have any consequence since you need 3 probe failures for the Pod to get rescheduled.

saitejar commented 6 years ago

@alram OSD actually keeps restarting. I suppose because of the readiness probe failing 3 times as you said.

Here is the log: the pod gets terminate signal.

+ log SUCCESS
+ '[' -z SUCCESS ']'
++ date '+%F %T'
2017-12-07 02:09:52  /start_osd.sh: SUCCESS
+ TIMESTAMP='2017-12-07 02:09:52'
+ echo '2017-12-07 02:09:52  /start_osd.sh: SUCCESS'
+ return 0
+ start_forego
+ exec /usr/local/bin/forego start -f /etc/forego/ceph/Procfile
[0;37;1mforego   | [0mstarting ceph-1.1 on port 5000
[0;36;1mceph-1.1 | [0mstarting osd.1 at - osd_data /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1//journal
[0;36;1mceph-1.1 | [0;31;1m2017-12-07 02:09:52.190450 7f1a8c351e00 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
[0m[0;36;1mceph-1.1 | [0;31;1m2017-12-07 02:09:52.233089 7f1a8c351e00 -1 osd.1 298 log_to_monitors {default=true}
[0m[0;37;1mforego   | [0msending SIGTERM to ceph-1.1
[0;36;1mceph-1.1 | [0;31;1m2017-12-07 02:11:08.427715 7f1a648af700 -1 received  signal: Terminated from  PID: 1 task name: /usr/local/bin/forego start -f /etc/forego/ceph/Procfile  UID: 0
[0m[0;36;1mceph-1.1 | [0;31;1m2017-12-07 02:11:08.427764 7f1a648af700 -1 osd.1 302 *** Got signal Terminated ***
[0m[0;36;1mceph-1.1 | [0;31;1m2017-12-07 02:11:08.755213 7f1a648af700 -1 osd.1 302 shutdown
[0m
alram commented 6 years ago

I'm not able to reproduce. I mounted a local XFS to /var/lib/ceph/ceph-helm/osd and deployed an OSD in a directory successfully.

Can you check if the OSD was added to the cluster? (ceph -s from a mon Pod) and if so can you run a ceph osd dump?

saitejar commented 6 years ago

@alram Looks like osds are down. I get this when I do ceph -s.

  cluster:
    id:     3d691e73-570c-477a-a3ac-a7a574faff6b
    health: HEALTH_WARN
            2 osds down
            2 hosts (2 osds) down
            1 root (2 osds) down
            Reduced data availability: 16 pgs inactive, 16 pgs stale
            Degraded data redundancy: 42/63 objects degraded (66.667%), 16 pgs unclean, 16 pgs degraded, 16 pgs undersized
            too few PGs per OSD (8 < min 30)

  services:
    mon: 1 daemons, quorum petuumos-master0
    mgr: petuumos-master0(active)
    mds: cephfs-1/1/1 up  {0=mds-ceph-mds-7bb8b6f9c8-gs8kd=up:active}
    osd: 2 osds: 0 up, 2 in

on running ceph osd dump

epoch 797
fsid 3d691e73-570c-477a-a3ac-a7a574faff6b
created 2017-12-06 22:13:50.706771
modified 2017-12-07 16:48:50.937702
flags sortbitwise,recovery_deletes,purged_snapdirs
crush_version 3
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
require_min_compat_client jewel
min_compat_client jewel
require_osd_release luminous
pool 1 'cephfs_data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 8 flags hashpspool stripe_width 0 application cephfs
pool 2 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 8 flags hashpspool stripe_width 0 application cephfs
max_osd 2
osd.0 down in  weight 1 up_from 793 up_thru 795 down_at 797 last_clean_interval [785,790) 10.244.1.0:6800/33 10.244.1.0:6801/33 10.244.1.0:6802/33 10.244.1.0:6803/33 exists 03616c7e-91ce-47c9-b43e-4a124bf9b3b5
osd.1 down in  weight 1 up_from 789 up_thru 793 down_at 795 last_clean_interval [783,786) 10.244.2.0:6800/31 10.244.2.0:6801/31 10.244.2.0:6802/31 10.244.2.0:6803/31 exists 534f15f7-d9c9-40d5-adab-a009e72caaec
alram commented 6 years ago

Thanks! The OSDs were added to the cluster. It is strange that the probe failed since they used port 6800. Can you try to remove the readiness/liveness probes from the manifest to make sure the OSDs stay up?

saitejar commented 6 years ago

Sure, will try it and get back soon.

rootfs commented 6 years ago

@saitejar can you post ceph osd log, mon and mgr logs too?

saitejar commented 6 years ago

@rootfs Sure. Attached the logs of osd, mgr, mon. logs-from-ceph-mgr-in-ceph-mgr.txt logs-from-osd-pod-in-ceph-osd.txt logs-from-ceph-mon-in-ceph-mon.txt

saitejar commented 6 years ago

@rootfs @alram
Removing the readiness and liveness probe, osd's register with successfully. I guess something is wrong with readiness/liveness.

rootfs commented 6 years ago

good news, are you able to e.g. create rbd image and use ceph after deployment? maybe we need a longer timeout as a fix.

SacDin commented 6 years ago

Hello, having the same issue

Is current workaround is to ignore readiness state ?

dmick commented 6 years ago

Do the logs from the live/ready checks show them failing for sure? I'd try increasing the times for the checks first

saitejar commented 6 years ago

How can I get the log for live/ready checks? Is it part of the log of the pod?

dmick commented 6 years ago

Good question. Experimenting

dmick commented 6 years ago

Perhaps "kubectl describe" on the pod shows it

saitejar commented 6 years ago

In the describe, I just see this: Readiness probe failed: dial tcp 10.211.55.186:6800: getsockopt: connection refused Back-off restarting failed container Error syncing pod

dmick commented 6 years ago

well, at least that's solid confirmation that the probe failed. "connection refused" seems weird though; i wouldn't expect that if the daemon were actually still alive but slow.

saitejar commented 6 years ago

So, you think it is not a timeout issue? Let me check if I am able to use ceph without ready/live probes.

dmick commented 6 years ago

I thought you already had removed the probes and that kept the OSDs up? That doesn't resolve anything, however, because we know from the logs that it was the probe that was failing.

saitejar commented 6 years ago

Yes, I already did that, but I never actually checked mounting cephfs and writing to it.

saitejar commented 6 years ago

@rootfs @dmick I am unable to mount cephfs in a pod. #45

SacDin commented 6 years ago

I actually gave it up because had found no reason to deploy openstack on kubernetes. Why should not I use only kubernetes to run containers ? may be my use case is simpler compared to you people.

However, ceph in pod did not work.

On Dec 13, 2017 10:26 PM, "Sai Teja Ranuva" notifications@github.com wrote:

@rootfs https://github.com/rootfs I am unable to mount cephfs in a pod.

45 https://github.com/ceph/ceph-helm/issues/45

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ceph/ceph-helm/issues/42#issuecomment-351453267, or mute the thread https://github.com/notifications/unsubscribe-auth/AFsf4TVXb3njHrYvjBo10rfNyQvCe1_pks5tAAG_gaJpZM4Q4V9E .

saitejar commented 6 years ago

@rootfs By removing ready/live checks. I was able to mount cephfs and play around with it, it looks good so far. I am not sure, if I may face some issue later though. I can provide any logs if needed to resolve the issue.

Cyclic3 commented 6 years ago

This is still a problem. I can submit a pull request with this disabled if anyone wants it.

ghost commented 6 years ago

Hi same problem here. In my case on the worker, with the pod ceph-mgr, the readiness and liveness of the pods (ceph-osd-dev-sd) are ok. by the way on the worker, without the pod ceph-mgr, the readiness and liveness of the pods (ceph-osd-dev-sd) are failing, but the pods are running until the check restart its.

some more details: the error from the pod is: Readiness probe failed: dial tcp 10.4.62.110:6800: getsockopt: connection refused

on the pod with netstat i see: on worker 10.4.62.109 tcp 0 0 0.0.0.0:6800 0.0.0.0:* LISTEN 111835/ceph-mgr on worker 10.4.62.110 there is no listener on 0.0.0.0:6800

thanks

alram commented 6 years ago

Just checked the code: https://github.com/ceph/ceph-helm/blob/master/ceph/ceph/templates/daemonset-osd-devices.yaml#L266-L274 this is indeed wrong.

6800 is just the 1st port for OSD/mgr to map to on the 6800-7300 range (ms_bindport{min,max}). This means two things:

It should instead rely on the asok instead. I'm not working on this project anymore but the folks at openstack-helm fixed that a while back.

ghost commented 6 years ago

the workaround to removing the readinessProbe e livenessProbe for the pod daemonset-osd-devices.yaml is successful.

is not the definitive solution but for a test environment is a start Thanks

erichorwitz commented 5 years ago

I am seeing a similar issue.... Is there a fix for this?

Events: Type Reason Age From Message


Warning Unhealthy 21m (x40 over 64m) kubelet, kube02 Liveness probe failed: dial tcp 10.69.0.12:6800: connect: connection refused Warning Unhealthy 6m43s (x151 over 65m) kubelet, kube02 Readiness probe failed: dial tcp 10.69.0.12:6800: connect: connection refused Warning BackOff 101s (x183 over 57m) kubelet, kube02 Back-off restarting failed container

chinglinwen commented 5 years ago

Check the listening address may help. ( this tool can show listening info, https://github.com/drael/GOnetstat/blob/master/Examples/tcp.go )

I've met this liveness probe because, helm network public address range is wrong.

public change to host network range, fixed it.