Closed cagriersen closed 4 years ago
Additional progresss:
If I use CentOS Linux release 7.7.1908 (Core) with kernel 3.10.0-1062.el7.x86_64 the command works fine:
[root@cagri-test /]# ceph-volume --cluster ceph lvm batch --bluestore --yes --prepare /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg --report
Total OSDs: 6
Type Path LV Size % of device
----------------------------------------------------------------------------------------------------
[data] /dev/sdb 99.00 GB 100%
----------------------------------------------------------------------------------------------------
[data] /dev/sdc 99.00 GB 100%
----------------------------------------------------------------------------------------------------
[data] /dev/sdd 99.00 GB 100%
----------------------------------------------------------------------------------------------------
[data] /dev/sde 99.00 GB 100%
----------------------------------------------------------------------------------------------------
[data] /dev/sdf 99.00 GB 100%
----------------------------------------------------------------------------------------------------
[data] /dev/sdg 99.00 GB 100%
[root@cagri-test /]# ceph-volume --cluster ceph lvm batch --bluestore --yes --prepare /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg --report
Total OSDs: 6
Type Path LV Size % of device
----------------------------------------------------------------------------------------------------
[data] /dev/sdb 99.00 GB 100%
----------------------------------------------------------------------------------------------------
[data] /dev/sdc 99.00 GB 100%
----------------------------------------------------------------------------------------------------
[data] /dev/sdd 99.00 GB 100%
----------------------------------------------------------------------------------------------------
[data] /dev/sde 99.00 GB 100%
----------------------------------------------------------------------------------------------------
[data] /dev/sdf 99.00 GB 100%
----------------------------------------------------------------------------------------------------
[data] /dev/sdg 99.00 GB 100%
[root@cagri-test /]# ceph-volume --cluster ceph lvm batch --bluestore --yes --prepare /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg --report
Total OSDs: 6
Type Path LV Size % of device
----------------------------------------------------------------------------------------------------
[data] /dev/sdb 99.00 GB 100%
----------------------------------------------------------------------------------------------------
[data] /dev/sdc 99.00 GB 100%
----------------------------------------------------------------------------------------------------
[data] /dev/sdd 99.00 GB 100%
----------------------------------------------------------------------------------------------------
[data] /dev/sde 99.00 GB 100%
----------------------------------------------------------------------------------------------------
[data] /dev/sdf 99.00 GB 100%
----------------------------------------------------------------------------------------------------
[data] /dev/sdg 99.00 GB 100%
It seems, ceph-volume has selected some of the disks to create a logical volume to place blocks.db incorrectly. Since all disks are same model SSD disks, it should use all of them for separate osds without any separate dbs.
This situation should only happen when there's a mix of HDD with SSD/NVMe devices when using ceph-volume batch otherwise this is a ceph-volume issue. Ceph-volume will try to create the Bluestore data on HDD and Bluestore DB on SSD/NVMe.
If you only have only SSD devices then CentOS 7.7 and 7.8 are probably reporting the rotational device flag differently.
Could you run:
Whould it also be possible to test with a more recent ceph container image (like v4.0.12 which is rebase on ceph nautilus 14.2.9) ?
Though I'm sure that all disks are SSD and reported correctly by the OS; here is the outputs:
[root@computehci-10 /]# ceph-volume inventory
Device Path Size rotates available Model name
/dev/sdb 1.09 TB False True INTEL SSDSC2BX01
/dev/sdc 1.09 TB False True INTEL SSDSC2BX01
/dev/sdd 1.09 TB False True INTEL SSDSC2BX01
/dev/sde 1.09 TB False True INTEL SSDSC2BX01
/dev/sdf 1.09 TB False True INTEL SSDSC2BX01
/dev/sdg 1.09 TB False True INTEL SSDSC2BX01
/dev/sda 59.63 GB False False SATADOM-SL 3IE3
[root@computehci-10 /]# for i in {a..f}; do cat /sys/block/sd$i/queue/rotational ; done;
0
0
0
0
0
0
[root@computehci-10 /]# lsblk -o NAME,ROTA
NAME ROTA
sda 0
|-sda1 0
`-sda2 0
sdb 0
sdc 0
sdd 0
sde 0
sdf 0
sdg 0
But newer version of ceph daemon (v4.0.12-stable-4.0-nautilus-centos-7) image works fine:
[root@computehci-10 /]# time ceph-volume --cluster ceph lvm batch --bluestore --yes --prepare /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sdf /dev/sde --report
Total OSDs: 6
Type Path LV Size % of device
----------------------------------------------------------------------------------------------------
[data] /dev/sda 58.00 GB 100%
----------------------------------------------------------------------------------------------------
[data] /dev/sdb 1.09 TB 100%
----------------------------------------------------------------------------------------------------
[data] /dev/sdc 1.09 TB 100%
----------------------------------------------------------------------------------------------------
[data] /dev/sdd 1.09 TB 100%
----------------------------------------------------------------------------------------------------
[data] /dev/sdf 1.09 TB 100%
----------------------------------------------------------------------------------------------------
[data] /dev/sde 1.09 TB 100%
real 0m1.743s
user 0m0.889s
sys 0m0.483s
[root@computehci-10 /]# time ceph-volume --cluster ceph lvm batch --bluestore --yes --prepare /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sdf /dev/sde --report
Total OSDs: 6
Type Path LV Size % of device
----------------------------------------------------------------------------------------------------
[data] /dev/sda 58.00 GB 100%
----------------------------------------------------------------------------------------------------
[data] /dev/sdb 1.09 TB 100%
----------------------------------------------------------------------------------------------------
[data] /dev/sdc 1.09 TB 100%
----------------------------------------------------------------------------------------------------
[data] /dev/sdd 1.09 TB 100%
----------------------------------------------------------------------------------------------------
[data] /dev/sdf 1.09 TB 100%
----------------------------------------------------------------------------------------------------
[data] /dev/sde 1.09 TB 100%
real 0m1.707s
user 0m0.937s
sys 0m0.472s
However we don't have a change the update the image to this version manually, since we have a running openstack cluster and the version is managed by the tripleo.
Do you have any idea to offer that we fix this without openstack upgrade ?
Do you have any idea to offer that we fix this without openstack upgrade ?
No there's nothing on the ceph-ansible side since it seems to come from ceph-volume itself and the only solution is to update to the latest version.
I've resolved issue by using old version CentOS generic cloud image. You can close the issue. Thx for your attention.
@cagriersen-omma we've hit this issue too. Can you please elaborate, what was that old CentOS generic cloud image? What release was it? What kernel? Thank you!
Hi @minkimipt
Please read the bug report and my following comment. I just mentioned the versions.
@minkimipt @cagri-rf @cagriersen-omma it seems like a race condition on the ceph-volume side, see the issue [1]
just linking it here to make the issue a little more easy to find.
@mgariepy thanks for sharing. We were able to work around this by making ceph-volume believe that all disks are rotational. This helps to know that it was indeed a bug in ceph-volume.
@minkimipt how you manage to believe ceph-volume that disks are rotational?
Hi @foysalkayum,
We've rebuilt the container with the following script. Hope it's clear what's in it. Basically we replaced one line in file "ceph_volume/util/device.py" inside rhceph container image. Here's the script, which you will need to adapt:
#/bin/bash
# dst_image = "172.18.2.1:8787/rhceph/rhceph-3-rhel7:3-42"
# script needs to run as root user on undercloud VM
file=/var/lib/contrail_cloud/openstack-containers/contrail_cloud-openstack-containers.tgz
# this will take some time because it's loading container images into the locar registry of undercloud
docker load -i $file
export dst_image=$1
src_image=$(docker images --format "{{ .Repository }}:{{ .Tag }}" | grep ceph)
docker run $src_image
# assumption is that there are not other exited ceph containers on the node
container=$(docker ps -a | grep Exited | grep ceph | awk '{print $1}')
docker cp $container:/usr/lib/python2.7/site-packages/ceph_volume/util/device.py ./
# replacing return of rotational function to False, making backup file device.py.bak
sed -i.bak "s/ return rotational.*/ return False/g" device.py
# creating Dockerfile to build the image
cat << EOF > Dockerfile
FROM $src_image
COPY device.py /usr/lib/python2.7/site-packages/ceph_volume/util/device.py
EOF
# building the image, tagging it with the tag of current image and pushing to the registry
docker build -t $dst_image .
docker push $dst_image
docker rm $container
Bug Report
What happened: We use ceph-ansible through openstack tripleo to deploy and scale our openstack (stein) cluster as a HCI environment. Our all hosts have 6 same brand/model disk devices that used as OSDs. Currently we have 10 ComputeHCI nodes in our environment and works fine. All 6 SSD disks are used as seperate OSDs disks, so every hosts have 6 osds containers without problem. Since all these disks is SSD, there are no separate device for blocks.db. Here is our ceph custom config snippet:
and
ceph-volume lvm list
output from the one of the current node:As expected there are six OSDs and there are no separate logical volume for dbs.
However, last day we wanted to scale up our HCI nodes, so we added another 10 nodes to the environment. These nodes also have same hw configurations with the others. (Same brand/model server with same amount and brand/model of SSD disks)
After we initiated the re-deployment command, it's ended up with an error on ceph-ansible deployment phase for almost all disks that attached to our all newly added nodes:
stderr: '--> RuntimeError: Unable to use device, already a member of LVM: /dev/sdX'
Here's the task that failed:
As I tried to understand the issue, I noticed that
ceph-volume --cluster ceph lvm batch --bluestore --yes --prepare /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sdf /dev/sde --report
command select some of disks to use as create a logical volume forceph-block-dbs
When I check the
ceph-volume lvm list
output from a failed host, I see:Also
lsblk
andlvscan
command outputs from the same host say :It seems, ceph-volume has selected some of the disks to create a logical volume to place blocks.db incorrectly. Since all disks are same model SSD disks, it should use all of them for separate osds without any separate dbs.
After I seen this weird situation, I've login to one of the faulty node and wiped all disks (including lvm metadata) and ran the same command as ceph-ansible did.
So I started a container from the same ceph docker image:
And run:
Even more oddly, this command returns different outputs for each runs like below:
As you see, at the first try it decided to create 3 OSDs, and when I run the same command again, this time it decided to create 2 OSDs
Even more, because of this deployment situation we ended up with aHEALTH_WARN state Ceph Cluster since it added newly created OSDs to the cluster which made it as an inconsistent cluster.
The only difference is we use CentOS 7.8.2003 on faulty nodes, while our existing nodes is CentOS 7.7.1908. The issue might related to this differiancy; though I'll try to down grade CentOS version on faulty nodes.
What you expected to happen:
It should create OSDs with the correct decision which should pick one disk for one OSD with its db data.
How to reproduce it (minimal and precise):
Share your group_vars files, inventory and full ceph-ansibe log
The full output of ceph-volume's lvcreate action from /var/lib/mistral/overcloud/ceph-ansible/ceph_ansible_command.log
Also I see, some python exception in /var/log/ceph/ceph-volume.log on faulty nodes:
Environment:
OS (e.g. from /etc/os-release):
Faulty nodes:
Existing nodes:
Kernel (e.g.
uname -a
):Faulty nodes:
Existing nodes:
Docker version if applicable (e.g.
docker version
):Server Version: 1.13.1
Ansible version (e.g.
ansible-playbook --version
):ceph-ansible version (e.g.
git head or tag or stable branch
):ceph-ansible-4.0.10-1.el7.noarch
Ceph version (e.g.
ceph -v
):