canonical / charm-microceph

Charm to deploy/manage microceph
Apache License 2.0
2 stars 9 forks source link

Setting default-pool-size to 1 failed #74

Closed hemanthnakkina closed 1 month ago

hemanthnakkina commented 1 month ago

Version: microceph error 2 microceph reef/edge 47 no hook failed: "peers-relation-changed"

Reproducer: Deploy single node microceph using charm-microceph. Add new unit for charm-microceph. Change default-pool-size configuration to 1.

The below error is observed

unit-microceph-1: 04:15:14 ERROR unit.microceph/1.juju-log peers:1: Failed executing cmd: ['sudo', 'microceph', 'pool', 'set-rf', '--size', '1', ''], error: Error: failed setting replication factor: failed to set pool size default: Failed to run: ceph config set global osd_pool_default_size 1: exit status 1 (Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)'))

unit-microceph-1: 04:15:14 ERROR unit.microceph/1.juju-log peers:1: Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-microceph-1/charm/./src/charm.py", line 346, in <module>
    main(MicroCephCharm)
  File "/var/lib/juju/agents/unit-microceph-1/charm/venv/ops/main.py", line 544, in main
    manager.run()
  File "/var/lib/juju/agents/unit-microceph-1/charm/venv/ops/main.py", line 520, in run
    self._emit()
  File "/var/lib/juju/agents/unit-microceph-1/charm/venv/ops/main.py", line 506, in _emit
    self.framework.reemit()
  File "/var/lib/juju/agents/unit-microceph-1/charm/venv/ops/framework.py", line 859, in reemit
    self._reemit()
  File "/var/lib/juju/agents/unit-microceph-1/charm/venv/ops/framework.py", line 939, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-microceph-1/charm/./src/charm.py", line 116, in _on_config_changed
    self.configure_charm(event)
  File "/var/lib/juju/agents/unit-microceph-1/charm/./src/charm.py", line 113, in configure_charm
    self.configure_ceph(event)
  File "/var/lib/juju/agents/unit-microceph-1/charm/./src/charm.py", line 342, in configure_ceph
    raise e
  File "/var/lib/juju/agents/unit-microceph-1/charm/./src/charm.py", line 330, in configure_ceph
    microceph.set_pool_size("", str(default_rf))
  File "/var/lib/juju/agents/unit-microceph-1/charm/src/microceph.py", line 251, in set_pool_size
    _run_cmd(cmd)
  File "/var/lib/juju/agents/unit-microceph-1/charm/src/microceph.py", line 44, in _run_cmd
    raise e
  File "/var/lib/juju/agents/unit-microceph-1/charm/src/microceph.py", line 39, in _run_cmd
    process = subprocess.run(cmd, capture_output=True, text=True, check=True, timeout=180)
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['sudo', 'microceph', 'pool', 'set-rf', '--size', '1', '']' returned non-zero exit status 1.

The default pool size is set to 1 properly by the leader unit but the above error is observed on non-leader unit. Seems like the ceph commands are ran on all the units which is not required. Running the ceph config command on leader unit should be sufficient.

hemanthnakkina commented 1 month ago

This seems not a problem when the non-leader node is joined properly in the cluster. Above error is seen because the non leader node did not complete joining the cluster. So closing this issue