Open abhishek6590 opened 9 years ago
That sounds like there aren't any OSD processes running and connected to the cluster. If you check the output of ceph osd tree
, does it show that the cluster expects to have an OSD? If not, this means that the ceph-disk-prepare
script didn't run, which comes from the ceph::osd
recipe. If so, this means that the ceph::osd
script ran and initialized an OSD, but for some reason that OSD didn't connect to the cluster. Check the OSD server to make sure the process is running, and then look at the logs in /var/log/ceph/ceph-osd*
to see why the OSD isn't connecting.
Hi ceph osd tree is showing output -
-1 0.09 root default -2 0.09 host server3 0 0.09 osd.0 up 1
and logs are showing
tail -f ceph-osd.0.log
2015-02-03 12:50:44.115354 7f0d0d1b7900 0
Please suggest me.
Thanks,
Ah yes, you'll need at least 3 OSDs for Ceph to be happy and healthy. Depending on how your Crush map is configured, I forget the defaults, these OSDs will have to be on separate hosts.
Hi
I am a bit confused by this statement. "you'll need at least 3 OSD's to be happy and healthy". I followed the instructions (here: http://docs.ceph.com/docs/hammer/start/quick-ceph-deploy/) and once I get to the command "ceph health", the response is: "health HEALTH_ERR 64 pgs incomplete; 64 pgs stuck inactive; 64 pgs stuck unclean". That is when I install it...
Ceph documentation clearly stated: "Change the default number of replicas in the Ceph configuration file from 3 to 2 so that Ceph can achieve an active + clean state with just two Ceph OSDs. Add the following line under the [global] section: osd pool default size = 2"
I have attempted this install at least 3 times now and the response is the same every time. I am running 1 admin node, 1 monitor and 2 osd's on 4 VirtualBox Ubuntu 14.04 LTS VM's within Ubuntu 16 (previous attempt was within Ubuntu 14).
The debug information is not very helpful at all. Ceph is also not writing to the /var/log/ceph/ location at all even after I set permissions sudo chmod ceph:root /var/log/ceph
ceph-deploy osd activate tells me that the osd's are active but ceph osd tree shows otherwise. (down)
The config is read from /etc/ceph/cep.conf all the time (even though I install everything from my-cluster directory) which is incorrect. When I ran the install, the config was created in /home/user/my-cluster/ceph.conf yet it reads it from /etc/ceph/cep.conf.
So I will attempt 3 OSD's now even though the site states otherwise...
Any suggestions would be very helpful.
Thanks,
zd
Hi, I just have the same problem as yours, and I have reinstalled Ceph for more than 3 times. I'm really upset. Have you figured it out? Expect your suggestions.
Hi
If you are using ext4 file system, you need to place this in config global section:
filestore xattr use omap
Restart and see if HEALTH_OK achieved.
Cheers
On 02 Dec 2016 17:30, "LostSoul007" notifications@github.com wrote:
Hi, I just have the same problem as yours, and I have reinstalled Ceph for more than 3 times. I'm really upset. Have you figured it out? Expect your suggestions.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ceph/ceph-cookbook/issues/187#issuecomment-264480885, or mute the thread https://github.com/notifications/unsubscribe-auth/AU-ifrx6HvfwyP4KiNKNeRr-TMsVlB-7ks5rEDmkgaJpZM4DaskH .
Hi
First, thank you so much for your suggestion!!! My file system is ext4, and I just did the thing you suggested, but it seems to make no difference.
I reviewed the osd's log throughly and found the following words: osd.0 0 backend (filestore) is unable to support max object name[space] len osd.0 0 osd max object name len = 2048 osd.0 0 osd max object namespace len = 256 osd.0 0 (36) File name too long journal close /var/lib/ceph/osd/ceph-0/journal ** ERROR: osd init failed: (36) File name too long
Then I found this page: http://docs.ceph.com/docs/jewel/rados/configuration/filesystem-recommendations/
I just reinstalled Ceph again, and place the following words in config global section: osd_max_object_name_len = 256 osd_max_object_namespace_len = 64
It works!!! I'm so happy and I appreciate you reply very much!!!
Thanks again! Best wishes~
Hi
You are welcome.
I am glad you solved it.
Best Wishes
Zayne
On 03 Dec 2016 14:36, "LostSoul007" notifications@github.com wrote:
Hi
First, thank you so much for your suggestion!!! My file system is ext4, and I just did the thing you suggested, but it seems to make no difference.
I reviewed the osd's log throughly and found the following words: osd.0 0 backend (filestore) is unable to support max object name[space] len osd.0 0 osd max object name len = 2048 osd.0 0 osd max object namespace len = 256 osd.0 0 (36) File name too long journal close /var/lib/ceph/osd/ceph-0/journal ** ERROR: osd init failed: (36) File name too long
Then I found this page: http://docs.ceph.com/docs/jewel/rados/configuration/ filesystem-recommendations/
I just reinstalled Ceph again, and place the following words in config global section: osd_max_object_name_len = 256 osd_max_object_namespace_len = 64
It works!!! I'm so happy and I appreciate you reply very much!!!
Thanks again! Best wishes~
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ceph/ceph-cookbook/issues/187#issuecomment-264636825, or mute the thread https://github.com/notifications/unsubscribe-auth/AU-ifp3r6BBzkWacMmp_yBc3BFqxEr7Vks5rEWIxgaJpZM4DaskH .
If you are using ext4 file system, you need to place this in config global section:
osd_max_object_name_len = 256 osd_max_object_namespace_len = 64 http://docs.ceph.com/docs/jewel/rados/configuration/filesystem-recommendations/
I'm having the same problem, however I am using the preferred xfs filesystem.. Any suggestions?
[From monitor node i get the following] HEALTH_ERR 64 pgs are stuck inactive for more than 300 seconds; 64 pgs stuck inactive; no osds
[From OSD node] 2017-01-27 07:55:28.000882 7fde7846d700 0 -- :/429908835 >> ipaddress:6789/0 pipe(0x7fde74063f30 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7fde7405c5a0).fault
[From Monitor node out of /var/log/ceph/ceph.log] 2017-01-27 06:47:11.121804 mon.0 ipaddress:6789/0 1 : cluster [INF] mon.oso-node1@0 won leader election with quorum 0 2017-01-27 06:47:11.121931 mon.0ipaddress:6789/0 2 : cluster [INF] monmap e1: 1 mons at {oso-node1=ipaddress:6789/0} 2017-01-27 06:47:11.122008 mon.0 ipaddress:6789/0 3 : cluster [INF] pgmap v2: 64 pgs: 64 creating; 0 bytes data, 0 kB used, 0 kB / 0 kB avail 2017-01-27 06:47:11.122090 mon.0 ipaddress:6789/0 4 : cluster [INF] fsmap e1: 2017-01-27 06:47:11.122203 mon.0 ipaddress:6789/0 5 : cluster [INF] osdmap e1: 0 osds: 0 up, 0 in 2017-01-27 06:54:50.687322 mon.0 ipaddress:6789/0 1 : cluster [INF] mon.oso-node1@0 won leader election with quorum 0 2017-01-27 06:54:50.687415 mon.0 ipaddress:6789/0 2 : cluster [INF] monmap e1: 1 mons at {oso-node1=ipaddress:6789/0} 2017-01-27 06:54:50.687497 mon.0 ipaddress:6789/0 3 : cluster [INF] pgmap v2: 64 pgs: 64 creating; 0 bytes data, 0 kB used, 0 kB / 0 kB avail 2017-01-27 06:54:50.687577 mon.0 ipaddress:6789/0 4 : cluster [INF] fsmap e1: 2017-01-27 06:54:50.687716 mon.0 ipaddress:6789/0 5 : cluster [INF] osdmap e1: 0 osds: 0 up, 0 in
f_redirected e754) currently waiting for peered 2017-03-02 10:58:39.952422 osd.25 [WRN] 100 slow requests, 1 included below; oldest blocked for > 324.251003 secs 2017-03-02 10:58:39.952444 osd.25 [WRN] slow request 240.250943 seconds old, received at 2017-03-02 10:54:39.701431: osd_op(client.512724.0:135407 97.84ada7c9 (undecoded) ondisk+write+known_if_redirected e754) currently waiting for peered 2017-03-02 10:58:40.091373 osd.27 [WRN] 100 slow requests, 1 included below; oldest blocked for > 324.389960 secs 2017-03-02 10:58:40.091378 osd.27 [WRN] slow request 240.389941 seconds old, received at 2017-03-02 10:54:39.701397: osd_op(client.512724.0:135408 97.31099063 (undecoded) ondisk+write+known_if_redirected e754) currently waiting for peered 2017-03-02 10:58:40.952740 osd.25 [WRN] 100 slow requests, 1 included below; oldest blocked for > 325.251301 secs 2017-03-02 10:58:40.952791 osd.25 [WRN] slow request 240.243998 seconds old, received at 2017-03-02 10:54:40.708674: osd_op(client.36294.0:8895939 97.84ada7c9 (undecoded) ondisk+write+known_if_redirected e754) currently waiting for peered 2017-03-02 10:58:41.091613 osd.27 [WRN] 100 slow requests, 1 included below; oldest blocked for > 325.390198 secs 2017-03-02 10:58:41.091619 osd.27 [WRN] slow request 240.382847 seconds old, received at 2017-03-02 10:54:40.708729: osd_op(client.36294.0:8895940 97.31099063 (undecoded) ondisk+write+known_if_redirected e754) currently waiting for peered 2017-03-02 10:58:43.953496 osd.25 [WRN] 100 slow requests, 1 included below; oldest blocked for > 328.252086 secs 2017-03-02 10:58:43.953517 osd.25 [WRN] slow request 240.022847 seconds old, received at 2017-03-02 10:54:43.930609: osd_op(client.36291.0:8893352 97.84ada7c9 (undecoded) ondisk+write+known_if_redirected e754) currently waiting for peered 2017-03-02 10:58:44.092310 osd.27 [WRN] 100 slow requests, 1 included below; oldest blocked for > 328.390885 secs 2017-03-02 10:58:44.092315 osd.27 [WRN] slow request 240.161657 seconds old, received at 2017-03-02 10:54:43.930605: osd_op(client.36291.0:8893353 97.31099063 (undecoded) ondisk+write+known_if_redirected e754) currently waiting for peered 2017-03-02 10:58:44.953818 osd.25 [WRN] 100 slow requests, 1 included below; oldest blocked for > 329.252386 secs 2017-03-02 10:58:44.953827 osd.25 [WRN] slow request 240.251734 seconds old, received at 2017-03-02 10:54:44.702023: osd_op(client.512724.0:135415 97.84ada7c9 (undecoded) ondisk+write+known_if_redirected e754) currently waiting for peered 2017-03-02 10:58:45.092587 osd.27 [WRN] 100 slow requests, 1 included below; oldest blocked for > 329.391155 secs 2017-03-02 10:58:45.092597 osd.27 [WRN] slow request 240.390484 seconds old, received at 2017-03-02 10:54:44.702049: osd_op(client.512724.0:135416 97.31099063 (undecoded) ondisk+write+known_if_redirected e754) currently waiting for peered 2017-03-02 10:58:45.954085 osd.25 [WRN] 100 slow requests, 1 included below; oldest blocked for > 330.252673 secs 2017-03-02 10:58:45.954103 osd.25 [WRN] slow request 240.244915 seconds old, received at 2017-03-02 10:54:45.709129: osd_op(client.36294.0:8895947 97.84ada7c9 (undecoded) ondisk+write+known_if_redirected e754) currently waiting for peered 2017-03-02 10:58:46.092838 osd.27 [WRN] 100 slow requests, 1 included below; oldest blocked for > 330.391422 secs 2017-03-02 10:58:46.092850 osd.27 [WRN] slow request 240.383640 seconds old, received at 2017-03-02 10:54:45.709160: osd_op(client.36294.0:8895948 97.31099063 (undecoded) ondisk+write+known_if_redirected e754) currently waiting for peered
After adding the following lines to /etc/ceph/ceph.conf file and reboot the system. Somehow, the issue still exists.
osd_max_object_name_len = 256 osd_max_object_namespace_len = 64
cluster b3609cba-0b6d-4311-8aa3-6968c0e66f5e
health HEALTH_WARN
64 pgs degraded
64 pgs stuck degraded
64 pgs stuck unclean
64 pgs stuck undersized
64 pgs undersized
monmap e1: 1 mons at {0=10.11.108.188:6789/0}
election epoch 3, quorum 0 0
osdmap e15: 2 osds: 2 up, 2 in
flags sortbitwise,require_jewel_osds
pgmap v36: 64 pgs, 1 pools, 0 bytes data, 0 objects
69172 kB used, 3338 GB / 3338 GB avail
_64 active+undersized+degraded
I met those ext4 file system issue before. I tried below settings in ceph.conf but finally gave up.
osd_max_object_name_len = 256 osd_max_object_namespace_len = 64 osd check max object name len on startup = false
However, I follow this helpful document to deploy Ceph Jewel 10.2.9 on Ubuntu 16.04. Login to all OSD nodes and format the /dev/sdb partition with XFS file system. After that, I follow official document to deploy ceph on my ubuntu 16.04 servers. Everything works fine now.
i have exactly same Problem with 14.04 LTS ext4. I tried almost everything and all suggestions above. But i'm still getting following on celp -s and next one on celp osd tree
health HEALTH_ERR 64 pgs are stuck inactive for more than 300 seconds 64 pgs stuck inactive 64 pgs stuck unclean
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 0 root default 0 0 osd.0 down 0 1.00000
After appended those lines into admin_node's ceph.conf:
osd max object name len = 256
osd max object namespace len = 64
then I think you should run ceph-deploy --overwrite-conf admin osd1 osd2
to deploy the changes to osd nodes. And you should make sure the user ceph
has r
permission of /etc/ceph/ceph.client.admin.keyring
in the osd nodes.
When my server reboot and then see error osds down and pgs inactive. Please help me. How can I solve this. This storage using for cloudstack primary storage.
Thanks.
Please help me anyone.
Does this look like your error?
https://tracker.ceph.com/issues/17722
On Mon, Sep 7, 2020 at 6:21 PM alamintech notifications@github.com wrote:
Please help me anyone. [image: image] https://user-images.githubusercontent.com/68062764/92364757-5a4fae00-f115-11ea-90ee-a61246a87297.png
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ceph/ceph-cookbook/issues/187#issuecomment-688154975, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALAZ46AN3QNEZNX6FWV6ZDSESJYRANCNFSM4A3KZEDQ .
See but can't find solution for this
After server reboot can't start osd service. Please help me any one.
Hi,
I am having an issue of ceph health - health HEALTH_WARN 64 pgs incomplete; 64 pgs stuck inactive; 64 pgs stuck unclean Please suggest me what should I check.
Thanks, Abhishek