Closed tserong closed 4 years ago
This should be right, but I'm currently half way through an upgrade, so will confirm shortly.
Wait a minute... I finally got through a manual test and ended up with this:
admin:~ # ceph-salt apply
Syncing minions with the master...
Checking if minions respond to ping...
Pinging 9 minions...
Checking if ceph-salt formula is available...
Checking if minions have functioning DNS...
Running DNS lookups on 9 minions...
Checking if there is an existing Ceph cluster...
Ceph cluster already exists
/time_server is disabled. Will check if minions have a time_sync service enabled and running...
Checking time sync service on 9 minions...
Time sync issues detected on host(s) admin.ceph
/time_server is disabled. In that case, a time sync service must be enabled and running on all minions. Please fix this issue and try again.
/var/log/ceph-salt.log says
2020-10-13 11:48:54,021 [INFO] [ceph_salt.execute] probe_time_sync returned: {'data3.ceph': True, 'mon2.ceph': True, 'data2.ceph': True, 'data4.ceph': True, 'data5.ceph': True, 'mon3.ceph': True, 'data1.ceph': True, 'mon1.ceph': True, 'admin.ceph': False}
2020-10-13 11:48:54,021 [INFO] [ceph_salt.execute] Time sync service is enabled and running on host data3.ceph
2020-10-13 11:48:54,021 [INFO] [ceph_salt.execute] Time sync service is enabled and running on host mon2.ceph
2020-10-13 11:48:54,021 [INFO] [ceph_salt.execute] Time sync service is enabled and running on host data2.ceph
2020-10-13 11:48:54,021 [INFO] [ceph_salt.execute] Time sync service is enabled and running on host data4.ceph
2020-10-13 11:48:54,022 [INFO] [ceph_salt.execute] Time sync service is enabled and running on host data5.ceph
2020-10-13 11:48:54,022 [INFO] [ceph_salt.execute] Time sync service is enabled and running on host mon3.ceph
2020-10-13 11:48:54,022 [INFO] [ceph_salt.execute] Time sync service is enabled and running on host data1.ceph
2020-10-13 11:48:54,022 [INFO] [ceph_salt.execute] Time sync service is enabled and running on host mon1.ceph
2020-10-13 11:48:54,022 [ERROR] [ceph_salt.execute] Time sync service is NOT enabled and running on host admin.ceph
...but maybe that makes sense? In my case, admin.ceph actually is the time server for this test cluster, so, um, does that mean the check in ceph-salt is incorrect? Or did DeepSea misconfigure admin.ceph? Or is there some other problem? @ricardoasmarques @swiftgist
@tserong Here is the Salt module that is run to determine whether time sync is enabled and running on a given minion:
def probe_time_sync():
units = [
'chrony.service', # 18.04 (at least)
'chronyd.service', # el / opensuse
'systemd-timesyncd.service',
'ntpd.service', # el7 (at least)
'ntp.service', # 18.04 (at least)
]
if not _check_units(units):
log_msg = ('No time sync service is running; checked for: '
.format(', '.join(units)))
log.warning(log_msg)
return False
return True
What you saw indicates that this module returned "False" when it ran on admin.ceph
. You should be able to see more details in the admin.ceph
minion log.
The motivation for having this check is that mgr/cephadm
itself will refuse to manage a node that does not pass this check.
Whether the check is "correct" or not is a good question. If admin.ceph
is NOT going to be managed by cephadm, then one could make an argument that this check does not apply to it. But right now the check is applied to all the minions that have the ceph-salt:member
grain.
Here are two ways this problem could be averted:
(1) add options to ceph-salt apply
to make it skip certain checks if we have reason to fear that they might be too strict.
(2) have ceph-salt apply
skip the time sync check if it detects a running ceph cluster.
At any rate, I think the error @tserong encountered is not related to this PR. I figure we don't want ceph-salt to re-do the battle-tested production time sync setup from SES6 in any case.
At any rate, I think the error @tserong encountered is not related to this PR. I figure we don't want ceph-salt to re-do the battle-tested production time sync setup from SES6 in any case.
Agreed. In my case, it turns out admin.ceph simply wasn't running chrony, even though it's meant to be the time server for my cluster, but it turns out that's actually me misconfiguring things :-/ because the SES6 deployment guide pretty clearly states that one needs to "Verify that the time synchronization service is enabled on each system start-up" (step 11 under https://documentation.suse.com/ses/6/html/ses-all/ceph-install-saltstack.html#ceph-install-stack)
@susebot run teuthology
Commit c8b24b2867df0abcd2dcb504a8ab3d6c29f837da is OK for suite deepsea:tier2. Check tests results in the Jenkins job: https://storage-ci.suse.de/job/pr-deepsea/484/
This commit changes the exported ceph-salt config to set time_server disabled, which means
ceph-salt apply
will not make any changes to whatever existing time server configuration is in place.Signed-off-by: Tim Serong tserong@suse.com
Fixes: https://bugzilla.suse.com/show_bug.cgi?id=1177607