ceph / ceph-ansible

Ansible playbooks to deploy Ceph, the distributed filesystem.
Apache License 2.0
1.69k stars 1.01k forks source link

ceph-volume problem when updating octopus minor releases #5916

Closed nikosmeds closed 4 years ago

nikosmeds commented 4 years ago

Bug Report

This appears very similar to https://github.com/ceph/ceph-ansible/issues/5362, however I have confirmed --force option is included.

What happened:

TASK [scan ceph-disk osds with ceph-volume if deploying nautilus] **************
Tuesday 06 October 2020  19:56:42 +0200 (0:00:00.098)       0:10:36.135 ******* 
fatal: [storage001-stg]: FAILED! => changed=true 
  cmd:
  - ceph-volume
  - --cluster=ceph
  - simple
  - scan
  - --force
  delta: '0:00:00.892080'
  end: '2020-10-06 19:56:43.691758'
  msg: non-zero return code
  rc: 1
  start: '2020-10-06 19:56:42.799678'
  stderr: |2-
     stderr: lsblk: /var/lib/ceph/osd/ceph-12: not a block device
     stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or /sys expected.
    Running command: /sbin/cryptsetup status tmpfs
     stderr: blkid: error: tmpfs: No such file or directory
     stderr: lsblk: tmpfs: not a block device
    Traceback (most recent call last):
      File "/usr/sbin/ceph-volume", line 11, in <module>
        load_entry_point('ceph-volume==1.0.0', 'console_scripts', 'ceph-volume')()
      File "/usr/lib/python3/dist-packages/ceph_volume/main.py", line 40, in __init__
        self.main(self.argv)
      File "/usr/lib/python3/dist-packages/ceph_volume/decorators.py", line 59, in newfunc
        return f(*a, **kw)
      File "/usr/lib/python3/dist-packages/ceph_volume/main.py", line 151, in main
        terminal.dispatch(self.mapper, subcommand_args)
      File "/usr/lib/python3/dist-packages/ceph_volume/terminal.py", line 194, in dispatch
        instance.main()
      File "/usr/lib/python3/dist-packages/ceph_volume/devices/simple/main.py", line 33, in main
        terminal.dispatch(self.mapper, self.argv)
      File "/usr/lib/python3/dist-packages/ceph_volume/terminal.py", line 194, in dispatch
        instance.main()
      File "/usr/lib/python3/dist-packages/ceph_volume/devices/simple/scan.py", line 378, in main
        device = Device(self.encryption_metadata['device'])
      File "/usr/lib/python3/dist-packages/ceph_volume/util/device.py", line 92, in __init__
        self._parse()
      File "/usr/lib/python3/dist-packages/ceph_volume/util/device.py", line 138, in _parse
        vgname, lvname = self.path.split('/')
    ValueError: not enough values to unpack (expected 2, got 1)
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>

What you expected to happen:

I expected all OSDs to be updated to the latest Ceph Octopus release. Right now we have a mixed bag:

$ sudo ceph tell osd.* version | grep version | awk '{print $2}' | sort | uniq -c
     12 "15.2.2",
      6 "15.2.5",

How to reproduce it (minimal and precise):

$ ansible-playbook -i ../staging.ini infrastructure-playbooks/rolling_update.yml -e ireallymeanit=yes

Share your group_vars files, inventory and full ceph-ansible log

Variables (spread across multiple files, some environment specific)

public_network: 10.106.0.0/16
cluster_network: 10.107.0.0/16
ceph_origin: repository
ceph_repository: community
ceph_stable_release: octopus

monitor_interface: br-storage
osd_objectstore: bluestore

devices:
  # NOTE: ceph-ansible automatically detects disk types (e.g. SSD,HDD) and
  # configures them appropriately:
  # /dev/sdb is SSD, used for Ceph databases
  - /dev/sdb
  # other devices are HDD, used for OSDs
  - /dev/sdc
  - /dev/sdd
  - /dev/sde
  - /dev/sdf
  - /dev/sdg
  - /dev/sdh

ceph_conf_overrides:
  global:
    osd_pool_default_size: 3
    osd_pool_default_min_size: 2
    bluestore_block_db_size: "{{ 40 * 1024 * 1024 * 1024 }}"
    bluestore_block_wal_size: "{{ 1 * 1024 * 1024 * 1024 }}"
    mon_pg_warn_max_object_skew: 20
    # TODO: Properly test the impact of these changes. While they resolve
    # issues allowing users to exceed their quota and broken connection errors,
    # the performance impact of moving from 600 second syncs to 0 seconds has
    # not been properly investigated by our team.
    rgw_bucket_quota_ttl: 0
    rgw_user_quota_bucket_sync_interval: 0

osd_group_name: crush_map
create_crush_tree: true
crush_rule_config: true

rack_replication:
  name: rack_replication
  root: default # the root value in which the replication bucket belongs to
  type: rack    # define bucket type for replication ie. 'datacenter', 'row', 'rack', 'host'
  default: true # default rule applied to future created pools (there can be only 1 default rule)

crush_rules:
  - "{{ rack_replication }}"

ceph_mgr_modules: [ status, prometheus ]

dashboard_enabled: False

secure_cluster: false

openstack_config: true
openstack_glance_pool:
  name: images
  pg_num: 32
  pgp_num: 32
  application: rbd
  pg_autoscale_mode: True
  target_size_ratio: 0.1
openstack_cinder_pool:
  name: volumes
  pg_num: 256
  pgp_num: 256
  application: rbd
  pg_autoscale_mode: True
  target_size_ratio: 1
openstack_nova_pool:
  name: vms
  pg_num: 128
  pgp_num: 128
  application: rbd
  pg_autoscale_mode: True
  target_size_ratio: 0.5
openstack_cinder_backup_pool:
  name: backups
  pg_num: 128
  pgp_num: 128
  application: rbd
  pg_autoscale_mode: True
  target_size_ratio: 0.5
openstack_pools:
  - "{{ openstack_glance_pool }}"
  - "{{ openstack_cinder_pool }}"
  - "{{ openstack_nova_pool }}"
  - "{{ openstack_cinder_backup_pool }}"

Inventory

[mgrs]
controller002-stg
controller003-stg
controller004-stg

[mons]
controller002-stg
controller003-stg
controller004-stg

[osds]
storage001-stg
storage002-stg
storage003-stg

[rgws]
controller002-stg
controller003-stg
controller004-stg

[rgwloadbalancers]
controller002-stg
controller003-stg
controller004-stg

[crush_map]
storage001-stg osd_crush_location="{ 'root': 'default', 'datacenter': 'dc1', 'row': 'row2', 'rack': 'rack6', 'host': 'storage001-stg' }"
storage002-stg osd_crush_location="{ 'root': 'default', 'datacenter': 'dc1', 'row': 'row2', 'rack': 'rack1', 'host': 'storage002-stg' }"
storage003-stg osd_crush_location="{ 'root': 'default', 'datacenter': 'dc1', 'row': 'row2', 'rack': 'rack3', 'host': 'storage003-stg' }"

Log content

$ sudo less /var/log/ceph/ceph-volume.log
[2020-10-06 20:14:12,253][ceph_volume.main][INFO  ] Running command: ceph-volume  simple scan --force
[2020-10-06 20:14:12,255][ceph_volume.process][INFO  ] Running command: /bin/systemctl show --no-pager --property=Id --state=running ceph-osd@*
[2020-10-06 20:14:12,281][ceph_volume.process][INFO  ] stdout Id=ceph-osd@12.service
[2020-10-06 20:14:12,281][ceph_volume.process][INFO  ] stdout Id=ceph-osd@9.service
[2020-10-06 20:14:12,282][ceph_volume.process][INFO  ] stdout Id=ceph-osd@15.service
[2020-10-06 20:14:12,282][ceph_volume.process][INFO  ] stdout Id=ceph-osd@4.service
[2020-10-06 20:14:12,282][ceph_volume.process][INFO  ] stdout Id=ceph-osd@3.service
[2020-10-06 20:14:12,282][ceph_volume.process][INFO  ] stdout Id=ceph-osd@0.service
[2020-10-06 20:14:12,287][ceph_volume.process][INFO  ] Running command: /bin/lsblk -plno KNAME,NAME,TYPE
[2020-10-06 20:14:12,295][ceph_volume.process][INFO  ] stdout /dev/sda   /dev/sda                                                                                                                        disk
[2020-10-06 20:14:12,296][ceph_volume.process][INFO  ] stdout /dev/sda1  /dev/sda1                                                                                                                       part
[2020-10-06 20:14:12,296][ceph_volume.process][INFO  ] stdout /dev/sda2  /dev/sda2                                                                                                                       part
[2020-10-06 20:14:12,296][ceph_volume.process][INFO  ] stdout /dev/sda3  /dev/sda3                                                                                                                       part
[2020-10-06 20:14:12,296][ceph_volume.process][INFO  ] stdout /dev/sdb   /dev/sdb                                                                                                                        disk
[2020-10-06 20:14:12,296][ceph_volume.process][INFO  ] stdout /dev/sdc   /dev/sdc                                                                                                                        disk
[2020-10-06 20:14:12,296][ceph_volume.process][INFO  ] stdout /dev/sdd   /dev/sdd                                                                                                                        disk
[2020-10-06 20:14:12,296][ceph_volume.process][INFO  ] stdout /dev/sde   /dev/sde                                                                                                                        disk
[2020-10-06 20:14:12,296][ceph_volume.process][INFO  ] stdout /dev/sdf   /dev/sdf                                                                                                                        disk
[2020-10-06 20:14:12,296][ceph_volume.process][INFO  ] stdout /dev/sdg   /dev/sdg                                                                                                                        disk
[2020-10-06 20:14:12,296][ceph_volume.process][INFO  ] stdout /dev/sdh   /dev/sdh                                                                                                                        disk
[2020-10-06 20:14:12,296][ceph_volume.process][INFO  ] stdout /dev/dm-0  /dev/mapper/dom-swap                                                                                                            lvm
[2020-10-06 20:14:12,296][ceph_volume.process][INFO  ] stdout /dev/dm-1  /dev/mapper/dom-root                                                                                                            lvm
[2020-10-06 20:14:12,296][ceph_volume.process][INFO  ] stdout /dev/dm-2  /dev/mapper/ceph--block--dbs--c960da32--5fcd--47fb--9f91--509618b2bd57-osd--block--db--71ddf6f5--d073--477e--81c4--dbd79fb8a5e8 lvm
[2020-10-06 20:14:12,296][ceph_volume.process][INFO  ] stdout /dev/dm-3  /dev/mapper/ceph--block--dbs--c960da32--5fcd--47fb--9f91--509618b2bd57-osd--block--db--521f1c0c--4b1c--4e78--949c--b2e902f440ba lvm
[2020-10-06 20:14:12,296][ceph_volume.process][INFO  ] stdout /dev/dm-4  /dev/mapper/ceph--block--dbs--c960da32--5fcd--47fb--9f91--509618b2bd57-osd--block--db--4d518495--3c1c--4cf2--ae09--87771257ee3d lvm
[2020-10-06 20:14:12,296][ceph_volume.process][INFO  ] stdout /dev/dm-5  /dev/mapper/ceph--block--dbs--c960da32--5fcd--47fb--9f91--509618b2bd57-osd--block--db--001214a1--d47f--4561--9861--e4f47440ea0d lvm
[2020-10-06 20:14:12,296][ceph_volume.process][INFO  ] stdout /dev/dm-6  /dev/mapper/ceph--block--dbs--c960da32--5fcd--47fb--9f91--509618b2bd57-osd--block--db--2884c76d--fb3d--49d1--8c1f--5a30f9d4ef83 lvm
[2020-10-06 20:14:12,296][ceph_volume.process][INFO  ] stdout /dev/dm-7  /dev/mapper/ceph--block--dbs--c960da32--5fcd--47fb--9f91--509618b2bd57-osd--block--db--e029bbad--2b01--43f5--9b41--02806b382a9f lvm
[2020-10-06 20:14:12,296][ceph_volume.process][INFO  ] stdout /dev/dm-8  /dev/mapper/ceph--block--a4e69c14--cd05--494a--9f76--8c8269a8a883-osd--block--14ceb650--d334--4d38--b44b--8bcc469b81f8          lvm
[2020-10-06 20:14:12,296][ceph_volume.process][INFO  ] stdout /dev/dm-9  /dev/mapper/ceph--block--f5b2a1aa--c291--4a32--98e7--baf089264dce-osd--block--e41b0d33--c045--47b1--8a4d--6aee6ed69880          lvm
[2020-10-06 20:14:12,297][ceph_volume.process][INFO  ] stdout /dev/dm-10 /dev/mapper/ceph--block--896db7e4--b0b2--45ee--8493--bc8e73aed916-osd--block--63d23d30--868d--4b78--bf8a--9876e994ac4d          lvm
[2020-10-06 20:14:12,297][ceph_volume.process][INFO  ] stdout /dev/dm-11 /dev/mapper/ceph--block--5a4034dc--2942--42a1--ab09--8e27fe08aafe-osd--block--e8e88aac--0a0d--4cfd--8113--e86569b32a4c          lvm
[2020-10-06 20:14:12,297][ceph_volume.process][INFO  ] stdout /dev/dm-12 /dev/mapper/ceph--block--6c0d7d6d--f562--4571--800c--afe93907fb57-osd--block--019b03a4--4fbb--4e80--92c2--92fefa72faf2          lvm
[2020-10-06 20:14:12,297][ceph_volume.process][INFO  ] stdout /dev/dm-13 /dev/mapper/ceph--block--bd1fb3c1--bebd--4d1e--a980--e2fe0e64c269-osd--block--bc94fd4d--d1dd--47a6--88c8--208e05943efc          lvm
[2020-10-06 20:14:12,309][ceph_volume.process][INFO  ] Running command: /sbin/lvs --noheadings --readonly --separator=";" -a -S lv_path=/var/lib/ceph/osd/ceph-12 -o lv_tags,lv_path,lv_name,vg_name,lv_uuid,lv_size
[2020-10-06 20:14:12,733][ceph_volume.process][INFO  ] Running command: /bin/lsblk --nodeps -P -o NAME,KNAME,MAJ:MIN,FSTYPE,MOUNTPOINT,LABEL,UUID,RO,RM,MODEL,SIZE,STATE,OWNER,GROUP,MODE,ALIGNMENT,PHY-SEC,LOG-SEC,ROTA,SCHED,TYPE,DISC-ALN,DISC-GRAN,DISC-MAX,DISC-ZERO,PKNAME,PARTLABEL /var/lib/ceph/osd/ceph-12
[2020-10-06 20:14:12,740][ceph_volume.process][INFO  ] stderr lsblk: /var/lib/ceph/osd/ceph-12: not a block device
[2020-10-06 20:14:12,741][ceph_volume.process][INFO  ] Running command: /sbin/blkid -p /var/lib/ceph/osd/ceph-12
[2020-10-06 20:14:12,746][ceph_volume.process][INFO  ] Running command: /usr/bin/ceph-bluestore-tool show-label --dev /var/lib/ceph/osd/ceph-12
[2020-10-06 20:14:12,778][ceph_volume.process][INFO  ] stderr unable to read label for /var/lib/ceph/osd/ceph-12: (21) Is a directory
[2020-10-06 20:14:12,779][ceph_volume.process][INFO  ] stderr 2020-10-06T20:14:12.771+0200 7fea0f1c00c0 -1 bluestore(/var/lib/ceph/osd/ceph-12) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-12: (21) Is a directory
[2020-10-06 20:14:12,780][ceph_volume.process][INFO  ] Running command: /usr/bin/ceph-bluestore-tool show-label --dev /var/lib/ceph/osd/ceph-12
[2020-10-06 20:14:12,812][ceph_volume.process][INFO  ] stderr unable to read label for /var/lib/ceph/osd/ceph-12: (21) Is a directory
[2020-10-06 20:14:12,812][ceph_volume.process][INFO  ] stderr 2020-10-06T20:14:12.803+0200 7efd099540c0 -1 bluestore(/var/lib/ceph/osd/ceph-12) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-12: (21) Is a directory
[2020-10-06 20:14:12,813][ceph_volume.process][INFO  ] Running command: /sbin/udevadm info --query=property /var/lib/ceph/osd/ceph-12
[2020-10-06 20:14:12,818][ceph_volume.process][INFO  ] stderr Unknown device, --name=, --path=, or absolute path in /dev/ or /sys expected.
[2020-10-06 20:14:12,822][ceph_volume.process][INFO  ] Running command: /sbin/cryptsetup status tmpfs
[2020-10-06 20:14:12,826][ceph_volume.process][INFO  ] stdout /dev/mapper/tmpfs is inactive.
[2020-10-06 20:14:12,826][ceph_volume.util.encryption][WARNING] failed to detect device mapper information
[2020-10-06 20:14:12,826][ceph_volume.process][INFO  ] Running command: /sbin/blkid -p -o udev tmpfs
[2020-10-06 20:14:12,829][ceph_volume.process][INFO  ] stderr blkid: error: tmpfs: No such file or directory
[2020-10-06 20:14:12,830][ceph_volume.process][INFO  ] Running command: /bin/lsblk --nodeps -P -p -o NAME,KNAME,MAJ:MIN,FSTYPE,MOUNTPOINT,LABEL,UUID,RO,RM,MODEL,SIZE,STATE,OWNER,GROUP,MODE,ALIGNMENT,PHY-SEC,LOG-SEC,ROTA,SCHED,TYPE,DISC-ALN,DISC-GRAN,DISC-MAX,DISC-ZERO,PKNAME,PARTLABEL tmpfs
[2020-10-06 20:14:12,834][ceph_volume.process][INFO  ] stderr lsblk: tmpfs: not a block device
[2020-10-06 20:14:12,835][ceph_volume][ERROR ] exception caught by decorator
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/ceph_volume/decorators.py", line 59, in newfunc
    return f(*a, **kw)
  File "/usr/lib/python3/dist-packages/ceph_volume/main.py", line 151, in main
    terminal.dispatch(self.mapper, subcommand_args)
  File "/usr/lib/python3/dist-packages/ceph_volume/terminal.py", line 194, in dispatch
    instance.main()
  File "/usr/lib/python3/dist-packages/ceph_volume/devices/simple/main.py", line 33, in main
    terminal.dispatch(self.mapper, self.argv)
  File "/usr/lib/python3/dist-packages/ceph_volume/terminal.py", line 194, in dispatch
    instance.main()
  File "/usr/lib/python3/dist-packages/ceph_volume/devices/simple/scan.py", line 378, in main
    device = Device(self.encryption_metadata['device'])
  File "/usr/lib/python3/dist-packages/ceph_volume/util/device.py", line 92, in __init__
    self._parse()
  File "/usr/lib/python3/dist-packages/ceph_volume/util/device.py", line 138, in _parse
    vgname, lvname = self.path.split('/')
ValueError: not enough values to unpack (expected 2, got 1)

Environment:

dsavineau commented 4 years ago

See https://github.com/ceph/ceph-ansible/issues/5819#issuecomment-697407360

nikosmeds commented 4 years ago

Ah, thanks @dsavineau - sorry I missed that!