I ran into a Ceph cluster with the following state:
"""
root@juju-733f2f-3-lxd-1:~# ceph health detail
HEALTH_WARN client is using insecure global_id reclaim; mons are allowing insecure global_id reclaim
[WRN] AUTH_INSECURE_GLOBAL_ID_RECLAIM: client is using insecure global_id reclaim
client.cinder-ceph-hdd at 10.130.21.15:0/133926252 is using insecure global_id reclaim
[WRN] AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED: mons are allowing insecure global_id reclaim
mon.juju-733f2f-3-lxd-1 has auth_allow_insecure_global_id_reclaim set to true
mon.juju-733f2f-5-lxd-1 has auth_allow_insecure_global_id_reclaim set to true
mon.juju-733f2f-4-lxd-15 has auth_allow_insecure_global_id_reclaim set to true
"""
This is just a FYI from the Ceph cluster, not anything fatal. In this case, juju-verify would fail as a result of verifier any Ceph OSD or Mon (to see if it could be safely rebooted or shutdown).
We should explore if a "warning" should be raised rather than failing to verify the environment. Please note that other more serious issues such as Ceph service degradation due to some OSDs being out of service will also return HEALTH_WARN.
Alternatively, the "get-health" action in charm-ceph-mon could accept "details=true" to further explore the time of messages (and track a list of known messages that could be ignored if no further "unknown messages" are shared (to start with, the scope would be to evaluate if HEALTH_WARN could raise a "warning" rather than "FAIL", if that's acceptable from an operations perspective).
I ran into a Ceph cluster with the following state: """ root@juju-733f2f-3-lxd-1:~# ceph health detail HEALTH_WARN client is using insecure global_id reclaim; mons are allowing insecure global_id reclaim [WRN] AUTH_INSECURE_GLOBAL_ID_RECLAIM: client is using insecure global_id reclaim client.cinder-ceph-hdd at 10.130.21.15:0/133926252 is using insecure global_id reclaim [WRN] AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED: mons are allowing insecure global_id reclaim mon.juju-733f2f-3-lxd-1 has auth_allow_insecure_global_id_reclaim set to true mon.juju-733f2f-5-lxd-1 has auth_allow_insecure_global_id_reclaim set to true mon.juju-733f2f-4-lxd-15 has auth_allow_insecure_global_id_reclaim set to true """
The above issue is caused by a Ceph client (a Nova instance booting from a cinder volume that uses an old version of the Ceph client), described at: https://ceph.io/releases/v14-2-20-nautilus-released
This is just a FYI from the Ceph cluster, not anything fatal. In this case, juju-verify would fail as a result of verifier any Ceph OSD or Mon (to see if it could be safely rebooted or shutdown).
We should explore if a "warning" should be raised rather than failing to verify the environment. Please note that other more serious issues such as Ceph service degradation due to some OSDs being out of service will also return HEALTH_WARN.
Alternatively, the "get-health" action in charm-ceph-mon could accept "details=true" to further explore the time of messages (and track a list of known messages that could be ignored if no further "unknown messages" are shared (to start with, the scope would be to evaluate if HEALTH_WARN could raise a "warning" rather than "FAIL", if that's acceptable from an operations perspective).
Imported from Launchpad using lp2gh.
date created: 2021-11-03T15:07:04Z
owner: aluria
assignee: None
the launchpad url