ceph / ceph-ansible

Ansible playbooks to deploy Ceph, the distributed filesystem.
Apache License 2.0
1.69k stars 1.01k forks source link

[stable-5.0] Impossible to add an OSD on an existing cluster #5941

Closed tlebars closed 3 years ago

tlebars commented 4 years ago

Bug Report

What happened: Cannot add some new OSDs after the first run.

What you expected to happen: New OSDs are created when new devices are added

How to reproduce it (minimal and precise):

  1. Launch a first run to deploy the Ceph cluster with some OSDs
  2. Add a new device (OSD) to the inventory
  3. Replay the playbook and find that the new device (OSD) is not integrated in the cluster.

When using the command executed by Ansible directly on the server, here is the error returned:

root@serv2 # ceph-volume lvm batch --bluestore --yes --block-db-size '33285996544' /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde --db-devices /dev/sdf /dev/sdg --report
-->  RuntimeError: 2 devices were filtered in non-interactive mode, bailing out

Configuration Files

group_vars:

---
# all.yml
dummy:
ntp_service_enabled: false
upgrade_ceph_packages: True

ceph_origin: repository
ceph_repository: community
ceph_stable_release: octopus
public_network: "X.X.X.X/X"
cluster_network: "X.X.X.X/X"
monitor_interface: XXX

generate_fsid: true
block_db_size: 33285996544 
ceph_tcmalloc_max_total_thread_cache: 134217728
dashboard_enabled: False
---
# mons.yml
dummy:
monitor_secret: "{{ monitor_keyring.stdout }}"
admin_secret: 'XXXXX'
---
# osds.yml
dummy:
osd_auto_discovery: false

Inventory:

---
# hosts file
all:
  children:
    mons:
      hosts:
        serv1:
        serv2
        serv3:
    osds:
      hosts:
        serv1:
          devices:
            - /dev/sda
            - /dev/sdb
            - /dev/sdc
            - /dev/sdd
            - /dev/sdg
          dedicated_devices:
            - /dev/sde
            - /dev/sdf
        serv2:
          devices:
            - /dev/sda
            - /dev/sdb
            - /dev/sdc
            - /dev/sdd
            - /dev/sde
          dedicated_devices:
            - /dev/sdf
            - /dev/sdg
        serv3:
          devices:
            - /dev/sda
            - /dev/sdb
            - /dev/sdc
            - /dev/sdd
            - /dev/sde
          dedicated_devices:
            - /dev/sdf
            - /dev/sdg
        serv4:
          devices:
            - /dev/sda
            - /dev/sdb
            - /dev/sdc
            - /dev/sdd
            - /dev/sde
          dedicated_devices:
            - /dev/sdf
            - /dev/sdg

Environment:

dsavineau commented 4 years ago

Could you share with us the devices capacity (for both bluestore data and db) ?

Are you sure to have enough space left on the db devices (sdf and sdg) for another 31G db logical volume ?

As per [1], if the db size can't be satified then the OSD isn't deployed.

[1] https://docs.ceph.com/en/latest/ceph-volume/lvm/batch/#explicit-sizing

tlebars commented 4 years ago

Hi,

Disk space does not seem to be the problem here. It is only a matter of adding a new OSD. For example, if at the first run of ceph-ansible, I directly integrate all the disks, it works without any problem.

root@serv2:~ (0) # lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0   7,3T  0 disk
sdb      8:16   0   7,3T  0 disk
sdc      8:32   0   7,3T  0 disk
sdd      8:48   0   7,3T  0 disk
sdf      8:64   0 447,1G  0 disk
sdg      8:80   0 447,1G  0 disk
sde      8:96   0   7,3T  0 disk
sdh      8:112  0 111,3G  0 disk
├─sdh1   8:113  0   512M  0 part /boot/efi
└─sdh2   8:114  0 110,8G  0 part /
stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 3 years ago

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

aqader commented 10 months ago

Is it solved?