ceph / ceph-cookbook

Chef cookbooks for Ceph
Apache License 2.0
100 stars 108 forks source link

Ceph not assigning an OSD to PGs #230

Closed m-no-2017 closed 7 years ago

m-no-2017 commented 7 years ago

Hello there Ceph-Experts,

I'm struggling with a newbie problem and as I am new to ceph i just can't figure out how to get out of my misery.

Here's the thing:

Exact hint with "ceph health detail":

Info:

Things i already tried:

my ceph.conf

[global]
fsid = 064819d2-2700-4572-8487-7047d9a342f5
mon_initial_members = srvelldckrtest00, srvelldckrtest01, srvelldckrtest02
mon_host = 172.27.2.190,172.27.2.191,172.27.2.192
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
public_network = 172.27.2.0/24

[mon]
mon_osd_allow_primary_affinity = true

[osd]
filestore_xattr_use_omap = true
osd_pool_default_size = 2
osd_pool_default_pg_num = 100
osd_pool_default_pgp_num = 100
osd_data = /var/lib/ceph/osd/$cluster-$id
osd_journal_size = 6144

my Crushmap

#begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable straw_calc_version 1

#devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3

#types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

#buckets
root default {
    id -1       # do not change unnecessarily
    # weight 0.000
    alg straw
    hash 0  # rjenkins1
}
host srvelldckrtest01 {
    id -2       # do not change unnecessarily
    # weight 1.000
    alg straw
    hash 0  # rjenkins1
    item osd.0 weight 1.000
}
host srvelldckrtest02 {
    id -3       # do not change unnecessarily
    # weight 1.000
    alg straw
    hash 0  # rjenkins1
    item osd.1 weight 1.000
}
host srvelldckrtest03 {
    id -4       # do not change unnecessarily
    # weight 1.000
    alg straw
    hash 0  # rjenkins1
    item osd.2 weight 1.000
}
host srvelldckrtest04 {
    id -5       # do not change unnecessarily
    # weight 1.000
    alg straw
    hash 0  # rjenkins1
    item osd.3 weight 1.000
}

#rules
rule replicated_ruleset {
    ruleset 0
    type replicated
    min_size 1
    max_size 10
    step take default
    step chooseleaf firstn 0 type host
    step emit
}

#end crush map

Output ceph osd tree

ID WEIGHT  TYPE NAME             UP/DOWN REWEIGHT PRIMARY-AFFINITY
-5 1.00000 host srvelldckrtest04
 3 1.00000     osd.3                  up  1.00000          1.00000
-4 1.00000 host srvelldckrtest03
 2 1.00000     osd.2                  up  1.00000                0
-3 1.00000 host srvelldckrtest02
 1 1.00000     osd.1                  up  1.00000                0
-2 1.00000 host srvelldckrtest01
 0 1.00000     osd.0                  up  1.00000          1.00000
-1       0 root default

Output ceph mon dump (i desparately added a 4th mon)

dumped monmap epoch 2
epoch 2
fsid 064819d2-2700-4572-8487-7047d9a342f5
last_changed 2017-05-04 10:01:33.564153
created 2017-05-04 08:57:10.162878
0: 172.27.2.190:6789/0 mon.srvelldckrtest00
1: 172.27.2.191:6789/0 mon.srvelldckrtest01
2: 172.27.2.192:6789/0 mon.srvelldckrtest02
3: 172.27.2.193:6789/0 mon.srvelldckrtest03
m-no-2017 commented 7 years ago

[SOLVED]

Ok the issue is related to the docs for creating an OSD with the long form. It doesn't work, but i don't know why.

Observed behaviour was a non-connecting state somewhere in the ceph-backend, which left all the PGs in a creating state. All OSDs were registered in the crushmap and appeared in the "ceph osd tree" command, but there is no hint for an actual "successfully connected" OSD.

My fix was to replace the "long form" OSD creation with the simple one in the docs.