ceph / ceph-ansible

Ansible playbooks to deploy Ceph, the distributed filesystem.
Apache License 2.0
1.69k stars 1.01k forks source link

adopt-cephadm.yml - error configparser.DuplicateSectionError: #7283

Closed jeevadotnet closed 2 years ago

jeevadotnet commented 2 years ago

What happened: Running adopt-cephadm.yml against my testbed which is currently pacfic on ceph-ansible stable-6.0

I then get the following ceph-ansible task issue:

PLAY [adopt ceph mon daemons] **************************************************************************************************************************

TASK [adopt mon daemon] ********************************************************************************************************************************
Friday 05 August 2022  11:03:28 +0200 (0:00:00.446)       0:01:28.437 *********
fatal: [A-08-02-storage.maas]: FAILED! => changed=false
  msg: |-
    Traceback (most recent call last):
      File "/usr/sbin/cephadm", line 8971, in <module>
        main()
      File "/usr/sbin/cephadm", line 8959, in main
        r = ctx.func(ctx)
      File "/usr/sbin/cephadm", line 5320, in command_ls
        ls = list_daemons(ctx, detail=not ctx.no_detail,
      File "/usr/sbin/cephadm", line 5380, in list_daemons
        fsid = get_legacy_daemon_fsid(ctx,
      File "/usr/sbin/cephadm", line 2311, in get_legacy_daemon_fsid
        fsid = get_legacy_config_fsid(cluster, legacy_dir=legacy_dir)
      File "/usr/sbin/cephadm", line 2288, in get_legacy_config_fsid
        config = read_config(config_file)
      File "/usr/sbin/cephadm", line 1686, in read_config
        cp.read(fn)
      File "/usr/lib/python3.8/configparser.py", line 697, in read
        self._read(fp, filename)
      File "/usr/lib/python3.8/configparser.py", line 1067, in _read
        raise DuplicateSectionError(sectname, fpname,
    configparser.DuplicateSectionError: While reading from '//etc/ceph/ceph.conf' [line 20]: section 'client.rgw.A-08-02-storage.rgw0' already exists
  rc: 1

From inspecting /etc/ceph/ceph.conf the following value [client.rgw.A-08-02-storage.rgw0] appears twice as per the error

[client.rgw.A-08-02-storage.rgw0]
host = A-08-02-storage
keyring = /var/lib/ceph/radosgw/ceph-rgw.A-08-02-storage.rgw0/keyring
log file = /var/log/ceph/ceph-rgw-A-08-02-storage.rgw0.log
rgw frontends = beast endpoint=10.102.51.11:7480
rgw thread pool size = 512

[client.rgw.A-08-02-storage.rgw0]
rgw content length compat = true
rgw enable apis = s3, swift, swift_auth, admin
rgw enable usage log = true
rgw enforce swift acls = true
rgw keystone accepted admin roles = admin, ResellerAdmin
rgw keystone accepted roles = Member, _member_, admin, ResellerAdmin
etc

How does one work around this duplication that ceph-ansible created in the first place?

What you expected to happen: To run the playbook as intended.

How to reproduce it (minimal and precise):

Inventory Group_vars

Environment:

guits commented 2 years ago

@jeevadotnet can you try to rerun the playbook with the following ceph_conf_overrides instead?

ceph_conf_overrides:
  global:
    osd_pool_default_size: 4
    osd_pool_default_min_size: 3
    osd_pool_default_pg_num: 32
    osd_pool_default_pgp_num: 32
  client.glance:
    rbd default data pool: images_data
  client.nova:
    rbd default data pool: vms_data
  client.cinder:
    rbd default data pool: volumes_data
  client.cinder-backup:
    rbd default data pool: backups_data
  mds:
    mds_cache_memory_limit: 95899345920
    mds_session_blacklist_on_timeout: false
  osd:
    osd_scrub_priority: 4
    osd_memory_target: 6442450944
    osd_max_scrubs: 1
    osd_scrub_load_threshold: 10
    osd_scrub_thread_suicide_timeout: 300
    osd_scrub_max_interval: 2419200
    osd_scrub_min_interval: 1209600
    osd_deep_scrub_interval: 3024000
    osd_deep_scrub_randomize_ratio: 0.01
    osd_scrub_interval_randomize_ratio: 0.5
    bluestore_warn_on_bluefs_spillover: false
    osd_deep_scrub_stride: 524288
  mon:
    osd_max_scrubs: 1
    osd_scrub_load_threshold: 10 
    auth_allow_insecure_global_id_reclaim: True
  mgr:
    osd_scrub_max_interval: 2419200
    osd_scrub_min_interval: 1209600
    osd_deep_scrub_interval: 3024000
    osd_deep_scrub_randomize_ratio: 0.01
    osd_scrub_interval_randomize_ratio: 0.5
    mon_max_pg_per_osd: 400
    mon_pg_warn_max_object_skew: 30
    osd_deep_scrub_stride: 524288
  client.rgw.A-08-02-storage.rgw0:
    "rgw keystone api version": "3"
    "rgw keystone url": "http://10.102.73.10:35357" 
    "rgw keystone accepted admin roles": "admin, ResellerAdmin" 
    "rgw keystone accepted roles": "Member, _member_, admin, ResellerAdmin"
    "rgw keystone implicit tenants": "true" 
    "rgw keystone admin user": "ceph_rgw" 
    "rgw keystone admin password": "PASSWORD" 
    "rgw keystone admin project": "service"
    "rgw keystone admin domain": "default"
    "rgw keystone verify ssl": "false"
    "rgw content length compat": "true"
    "rgw enable apis": "s3, swift, swift_auth, admin"
    "rgw s3 auth use keystone": "true"
    "rgw enforce swift acls": "true"
    "rgw swift account in url": "true"
    "rgw swift versioning enabled": "true"
    "rgw verify ssl": "false"
    "rgw enable usage log": "true" # logging
    "rgw usage log tick interval": "30" # logging
    "rgw usage log flush threshold": "1024" # logging
jeevadotnet commented 2 years ago

@guits shouldn't it be client.rgw.{{ hostvars[inventory_hostname]['ansible_facts']['hostname'] }}.rgw0" because I have 3x rgws clients as per the inventory. If I only do client.rgw.A-08-02-storage.rgw0 it will apply that to all the other 2 servers as well.

guits commented 2 years ago
ceph_conf_overrides:
  global:
    osd_pool_default_size: 4
...
  client.glance:
    rbd default data pool: images_data
  client.nova:
    rbd default data pool: vms_data
  client.cinder:
    rbd default data pool: volumes_data
  client.cinder-backup:
    rbd default data pool: backups_data
  mds:
    mds_cache_memory_limit: 95899345920
....
  osd:
    osd_scrub_priority: 4
....
  mon:
    osd_max_scrubs: 1
....
  mgr:
    osd_scrub_max_interval: 2419200
....
  client.rgw.A-08-02-storage.rgw0:
    "rgw keystone api version": "3"
....
  client.rgw.A-08-08-storage.rgw0:
    "rgw keystone api version": "3"
....
  client.rgw.A-09-02-storage.rgw0:
    "rgw keystone api version": "3"
....
jeevadotnet commented 2 years ago

@guits I've done as instructed but have a new duplicate issue now.

If I don't define any rgw parameters it creates a duplicate under my first client.rgw.A-08-02-storage.rgw0:

rgw frontends = beast endpoint=10.102.51.13:8080
rgw frontends = beast endpoint=10.102.51.11:8080

When I define it as the group_vars below, it creates

rgw frontends = beast endpoint=10.102.51.13:7480
rgw frontends = beast endpoint=10.102.51.11:7480

I've tried a variety of combinations by setting the parameter or not setting it, each one resulted in a duplicate entry for rgw frontends

ansible ceph.conf generated: link All.yml rgws.yml

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

prometheanfire commented 2 years ago

I'm having the same issue, I try to set an override (to enable swift). and end up getting duplicate sections. One section for just the overrides and one section that's based off of https://github.com/ceph/ceph-ansible/blob/b40e4bfe60cb14a8eac225086f60d5b170636b6d/roles/ceph-config/templates/ceph.conf.j2#L89-L119

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

prometheanfire commented 2 years ago

further update is that it seems like having the same section defined multiple times is ok, I suspect any keys redefined in later sections override the earlier sections.

guits commented 2 years ago

I suspect any keys redefined in later sections override the earlier sections.

yes, that's correct.

I'm sorry I don't have a lot of time for this at the moment, but I'll try to take a look at this as soon as possible...

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 2 years ago

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.