canonical / charmed-openstack-upgrader

Automatic upgrade tool for Charmed Openstack
https://canonical-charmed-openstack-upgrader.readthedocs-hosted.com/en/stable/
Apache License 2.0
4 stars 12 forks source link

[cinder / cinder-ceph] upgrading `cinder` will cause its subordinate `cinder-ceph` to failed on relation changed #466

Open chanchiwai-ray opened 1 week ago

chanchiwai-ray commented 1 week ago
Wait for up to 300s for app 'cinder' to reach the idle state ✖
2024-06-24 04:45:08 [ERROR] Timed out waiting for model:
  cinder/0 [executing] maintenance: Running openstack upgrade
2024-06-24 04:45:08 [ERROR] See the known issues at https://canonical-charmed-openstack-upgrader.readthedocs-hosted.com/en/stable/reference/known-issues/
unit-cinder-ceph-0: 05:24:20 WARNING unit.cinder-ceph/0.storage-backend-relation-changed Traceback (most recent call last):
unit-cinder-ceph-0: 05:24:20 WARNING unit.cinder-ceph/0.storage-backend-relation-changed   File "/var/lib/juju/agents/unit-cinder-ceph-0/charm/charmhelpers/core/strutils.py", line 94, in __init__
unit-cinder-ceph-0: 05:24:20 WARNING unit.cinder-ceph/0.storage-backend-relation-changed     self.index = self._list.index(item)
unit-cinder-ceph-0: 05:24:20 WARNING unit.cinder-ceph/0.storage-backend-relation-changed ValueError: tuple.index(x): x not in tuple
unit-cinder-ceph-0: 05:24:20 WARNING unit.cinder-ceph/0.storage-backend-relation-changed
unit-cinder-ceph-0: 05:24:20 WARNING unit.cinder-ceph/0.storage-backend-relation-changed During handling of the above exception, another exception occurred:
unit-cinder-ceph-0: 05:24:20 WARNING unit.cinder-ceph/0.storage-backend-relation-changed
unit-cinder-ceph-0: 05:24:20 WARNING unit.cinder-ceph/0.storage-backend-relation-changed Traceback (most recent call last):
unit-cinder-ceph-0: 05:24:20 WARNING unit.cinder-ceph/0.storage-backend-relation-changed   File "/var/lib/juju/agents/unit-cinder-ceph-0/charm/hooks/storage-backend-relation-changed", line 469, in <module>
unit-cinder-ceph-0: 05:24:20 WARNING unit.cinder-ceph/0.storage-backend-relation-changed     hooks.execute(sys.argv)
unit-cinder-ceph-0: 05:24:20 WARNING unit.cinder-ceph/0.storage-backend-relation-changed   File "/var/lib/juju/agents/unit-cinder-ceph-0/charm/charmhelpers/core/hookenv.py", line 963, in execute
unit-cinder-ceph-0: 05:24:20 WARNING unit.cinder-ceph/0.storage-backend-relation-changed     self._hooks[hook_name]()
unit-cinder-ceph-0: 05:24:20 WARNING unit.cinder-ceph/0.storage-backend-relation-changed   File "/var/lib/juju/agents/unit-cinder-ceph-0/charm/hooks/storage-backend-relation-changed", line 348, in storage_backend_changed
unit-cinder-ceph-0: 05:24:20 WARNING unit.cinder-ceph/0.storage-backend-relation-changed     storage_backend()
unit-cinder-ceph-0: 05:24:20 WARNING unit.cinder-ceph/0.storage-backend-relation-changed   File "/var/lib/juju/agents/unit-cinder-ceph-0/charm/hooks/storage-backend-relation-changed", line 320, in storage_backend
unit-cinder-ceph-0: 05:24:20 WARNING unit.cinder-ceph/0.storage-backend-relation-changed     subordinate_config = CephSubordinateContext()()
unit-cinder-ceph-0: 05:24:20 WARNING unit.cinder-ceph/0.storage-backend-relation-changed   File "/var/lib/juju/agents/unit-cinder-ceph-0/charm/hooks/cinder_contexts.py", line 71, in __call__
unit-cinder-ceph-0: 05:24:20 WARNING unit.cinder-ceph/0.storage-backend-relation-changed     if CompareOpenStackReleases(os_codename) >= "icehouse":
unit-cinder-ceph-0: 05:24:20 WARNING unit.cinder-ceph/0.storage-backend-relation-changed   File "/var/lib/juju/agents/unit-cinder-ceph-0/charm/charmhelpers/core/strutils.py", line 96, in __init__
unit-cinder-ceph-0: 05:24:20 WARNING unit.cinder-ceph/0.storage-backend-relation-changed     raise KeyError("Item '{}' is not in list '{}'"
unit-cinder-ceph-0: 05:24:20 WARNING unit.cinder-ceph/0.storage-backend-relation-changed KeyError: "Item 'zed' is not in list '('diablo', 'essex', 'folsom', 'grizzly', 'havana', 'icehouse', 'juno', 'kilo', 'liberty', 'mitaka', 'newton', 'ocata', 'pike', 'queens', 'rocky', 'stein', 'train', 'ussuri', 'victoria', 'wallaby', 'xena', 'yoga')'"

Environment: OpenStack deployed by stsstack-bundle: ./generate-bundle.sh --name cou -r yoga -s jammy --ceph --run

Step to reproduce: you can run cou upgrade on the environment mentioned above. Or do it manually following the cou plan

juju refresh cinder --channel zed/stable
juju config cinder openstack-origin=source:jammy-zed
# and wait until this error happens

If the we ignore the error in cinder-ceph, and force upgrading cinder-ceph

juju refresh cinder-ceph --channel zed/stable

The error in cinder-ceph will go away.

Relevant information:

Note: This happens on a openstack cloud delpoyed by stsstack-bundle, and it happens for upgrading from Jammy/Yoga or above .

jneo8 commented 6 days ago

I somehow think this is not the issue we can handle in the charm.

The options we can have is either:

  1. Handle two applications' upgrade together.
  2. Ignore the status of cinder-ceph until we run the upgrade for it.

For me option 2 is too tricky. We should run upgrade for these two application together.

@gabrielcocenza Is there is a clean way I can combine those two apps' upgrade together?

The only way from me is adding some workaround on _generate_control_plane_plan

jneo8 commented 5 days ago

Possible solution 1: Make sure the upgrading is in right order

We make sure the order of the upgrading is been followed, for example. The cinder-ceph is always upgrade after cinder. And remove the checking of cinder-ceph's status in cinder upgrading. So it's like a series of steps to upgrade both cinder and cinder-ceph.

Possible solution 2:

We decouple the steps and application logic in OpenStackApplication. A class to generate the upgrade steps shouldn't belong to one single application. This may need a lot of refactor and a lot of cou plan logic need to be re-created. This also affect the hypervisor planner.

gabrielcocenza commented 5 days ago

We should take care to not try to cover all the issues doing workarounds on COU. There are issues on the charms itself and this looks like one of then. If you see the latest version of charmhelpers zed is there but looking at the yoga/stable branch the list is not updated, so it seems that is a simple update on charmhelpers will fix that

jneo8 commented 5 days ago

The message from @ajkavanagh:

In general subordinates probably should be switched to the next OpenStack track before the principle charm, as the 'next' track always supports the current track. However, charms come from different eras, and the latest charms that were subordinates tried very hard not to 'upset' the principle. Older charms could be a bit hit and miss. In general, I'd still say that subordinates should have their track switched first, followed by principle charms.

The easiest way to resolve the issue maybe reverse the order of upgrading now, we upgrade subordinate first then principle.

jneo8 commented 2 days ago

The issue to trace document updating: OpenStack upgrade advice around subordinates and principle charms needs to be updated