Closed rbeldin closed 4 years ago
Apologies for the weird html. I'm not always clear on what will trigger automatic html code. :(
I didn't really start over, but started adapting commands from:
SUSE "Deploy a Fully Functional SUSE Enterprise Storage Cluster Test Environment in About 30 minutes" - https://www.suse.com/c/deploy-a-suse-enterprise-storage-test-environment-in-about-30-minutes/
this made use of different commands
deepsea stage run ceph.stage.N
vs
salt-run state.orch ceph.stage.N
It isn't clear to me what the difference between the two is. I thought they would be equivalent. I prefer that the deepsea commands provide a hint at what is going on.
After altering policy.cfg to match the example, and running through this again, it now gets further but complains at stage 3:
d fqdn : ['fqdn ses6-mon1.ceph does not match minion id ses6-mon1', 'fqdn ses6-osd3.ceph does not match minion id ses6-osd3', 'fqdn ses6-mds.ceph does not match minion id ses6-mds', 'fqdn ses6-osd2.ceph does not match minion id ses6-osd2', 'fqdn ses6-mon3.ceph doe s not match minion id ses6-mon3', 'fqdn ses6-mon2.ceph does not match minion id ses6-mon2', 'f qdn ses6-osd1.ceph does not match minion id ses6-osd1', 'fqdn ses6-admin.ceph does not match m inion id ses6-admin']
... Stage execution failed:
And there are still no disks found...
I'm missing what is going wrong here as name resolution seems to be working fine:
ses6-admin:~ # salt '' cmd.run 'host hostname
'
ses6-admin:
ses6-admin has address 192.168.122.10
ses6-mon2:
ses6-mon2 has address 192.168.122.21
ses6-mon1:
ses6-mon1 has address 192.168.122.20
ses6-mds:
ses6-mds has address 192.168.122.15
ses6-osd3:
ses6-osd3 has address 192.168.122.32
ses6-mon3:
ses6-mon3 has address 192.168.122.22
ses6-osd2:
ses6-osd2 has address 192.168.122.31
ses6-osd1:
ses6-osd1 has address 192.168.122.30
ses6-admin:~ # salt '' cmd.run 'host hostname
.ceph'
ses6-admin:
ses6-admin.ceph has address 192.168.122.10
ses6-mon1:
ses6-mon1.ceph has address 192.168.122.20
ses6-osd2:
ses6-osd2.ceph has address 192.168.122.31
ses6-mon3:
ses6-mon3.ceph has address 192.168.122.22
ses6-mds:
ses6-mds.ceph has address 192.168.122.15
ses6-mon2:
ses6-mon2.ceph has address 192.168.122.21
ses6-osd3:
ses6-osd3.ceph has address 192.168.122.32
ses6-osd1:
ses6-osd1.ceph has address 192.168.122.30
This is quite odd because name resolution works forward and backwards. That is I can resolve all hostnames and all ips back to hostnames for all members of the cluster.
There are no partition tables on any of the disks. Because this is a test setup, I made them small - 1gb. Is this a problem?
2.1.2 Minimum Disk Size REPORT DOCUMENTATION BUG#EDIT SOURCE There are two types of disk space needed to run on OSD: the space for the disk journal (for FileStore) or WAL/DB device (for BlueStore), and the primary space for the stored data. The minimum (and default) value for the journal/WAL/DB is 6 GB. The minimum space for data is 5 GB, as partitions smaller than 5 GB are automatically assigned the weight of 0.
So although the minimum disk space for an OSD is 11 GB, we do not recommend a disk smaller than 20 GB, even for testing purposes.
PS:
It would be nice if deployment gave a better message about disk size. I can see that the size check is baked into code in function osd_list...
Description of Issue/Question
In order to learn ceph (SES 6), I setup VMs cluster, all running SLES15 SP1 and registered to SUSE. I can make it through most of the setup, but device discovery seems to failing. I have 3 OSDs, each with 3 drives.
Salt says I don't have drives for data and db.
`#salt-run disks.report Found DriveGroup
Calling dg.report on compound target I@roles:storage
No valid json in ceph osd tree. Probably no cluster deployed yet.
ses6-osd1:
ses6-osd2:
ses6-osd3:
Executing ceph-volume on the osd nodes works and shows 3 available drives on each node:
`ses6-admin:~ # salt 'ses6-osd*' cmd.run 'ceph-volume inventory' ses6-osd1:
ses6-osd3:
ses6-osd2:
One thing that is strange is that I have no profile* directory in /srv anywhere. Some commands warn about this but I am not sure how to correct it:
# salt-run push.proposal [WARNING ] profile-default/cluster/*.sls matched no files [WARNING ] profile-defualt/stack/default/ceph/minions/*yml matched no files [WARNING ] role-mon/stack/default/ceph/minions/ses6-mon*.yml matched no files True
I've started over multiple times by doing:
salt-run disengage.safety salt-run state.orch ceph.purge
I end up at the same spot every time - even after recreating the proposal.
Setup
(Please provide relevant configs and/or SLS files (Be sure to remove sensitive info).)
My policy.cfg is: `cluster-ceph/cluster/.sls profile-default/cluster/.sls profile-defualt/stack/default/ceph/minions/*yml
config/stack/default/global.yml config/stack/default/ceph/cluster.yml
master and admin
role-master/cluster/ses6-admin.sls role-admin/cluster/ses6-admin.sls
role-mds/cluster/ses6-mds*.sls
role-mon/stack/default/ceph/minions/ses6-mon.yml role-mon/cluster/ses6-mon.sls role-mgr/cluster/ses6-mon*.sls
role-storage/cluster/ses6-osd*.sls`
And my global.yml is:
`master_minion: ses6-admin
subvolume_init: disabled`
My drive_groups.yml looks like this:
`ses6-admin:~ # cat /srv/salt/ceph/configuration/files/drive_groups.yml
default: target: 'I@roles:storage' data_devices: all: true db_devices: all: true`
Steps to Reproduce Issue
(Include debug logs if possible and relevant.)
Versions Report
Maybe this is the problem? ses6-admin:~ # salt-run deepsea-version 'deepsea-version' is not available.
ses6-admin:~ # rpm -qa | grep deepsea deepsea-0.9.27+git.0.93a84d2ea-3.9.1.noarch deepsea-cli-0.9.27+git.0.93a84d2ea-3.9.1.noarch
But these are the only two deepsee rpms from the repo:
ses6-admin:~ # zypper se deepsea* Refreshing service 'Basesystem_Module_15_SP1_x86_64'. Refreshing service 'SUSE_Enterprise_Storage_6_x86_64'. Refreshing service 'SUSE_Linux_Enterprise_Server_15_SP1_x86_64'. Refreshing service 'Server_Applications_Module_15_SP1_x86_64'. Loading repository data... Reading installed packages...
S | Name | Summary | Type ---+-------------+-----------------------------------------------+----------- i+ | deepsea | Salt solution for deploying and managing Ceph | package | deepsea | Salt solution for deploying and managing Ceph | srcpackage i+ | deepsea-cli | DeepSea command line | package
ses6-admin:~ # rpm -qi salt-minion Name : salt-minion Version : 2019.2.0 Release : 6.21.1 Architecture: x86_64 Install Date: Mon 20 Jan 2020 12:08:20 PM PST Group : System/Management Size : 41019 License : Apache-2.0 Signature : RSA/SHA256, Wed 04 Dec 2019 12:02:15 PM PST, Key ID 70af9e8139db7c82 Source RPM : salt-2019.2.0-6.21.1.src.rpm Build Date : Wed 04 Dec 2019 11:57:57 AM PST Build Host : sheep70 Relocations : (not relocatable) Packager : https://www.suse.com/ Vendor : SUSE LLC https://www.suse.com/ URL : http://saltstack.org/ Summary : The client component for Saltstack Description : Salt minion is queried and controlled from the master. Listens to the salt master and execute the commands. Distribution: SUSE Linux Enterprise 15
ses6-admin:~ # rpm -qi salt-master Name : salt-master Version : 2019.2.0 Release : 6.21.1 Architecture: x86_64 Install Date: Mon 20 Jan 2020 12:08:21 PM PST Group : System/Management Size : 2936818 License : Apache-2.0 Signature : RSA/SHA256, Wed 04 Dec 2019 12:02:15 PM PST, Key ID 70af9e8139db7c82 Source RPM : salt-2019.2.0-6.21.1.src.rpm Build Date : Wed 04 Dec 2019 11:57:57 AM PST Build Host : sheep70 Relocations : (not relocatable) Packager : https://www.suse.com/ Vendor : SUSE LLC https://www.suse.com/ URL : http://saltstack.org/ Summary : The management component of Saltstack with zmq protocol supported Description : The Salt master is the central server to which all minions connect. Enabled commands to remote systems to be called in parallel rather than serially. Distribution: SUSE Linux Enterprise 15`