[User Experience] If hypervisor upgrade failed and the user restarts COU, COU won't recognize that the user is in the middle of nova-compute upgrade.

jneo8 commented 1 month ago

In my case, the hypervisor upgrade failed due to an known issue #494 . If I manually fix the issue on the machine and restart the COU, I will get error message like:

Analyzing cloud... ---- Logging error ---
Traceback (most recent call last):
  File "/home/jneo8/.rye/py/cpython@3.10.13/install/lib/python3.10/logging/__init__.py", line 1100, in emit
    msg = self.format(record)
  File "/home/jneo8/.rye/py/cpython@3.10.13/install/lib/python3.10/logging/__init__.py", line 943, in format
    return fmt.format(record)
  File "/home/jneo8/.rye/py/cpython@3.10.13/install/lib/python3.10/logging/__init__.py", line 678, in format
    record.message = record.getMessage()
  File "/home/jneo8/.rye/py/cpython@3.10.13/install/lib/python3.10/logging/__init__.py", line 368, in getMessage
    msg = msg % self.args
  File "/home/jneo8/CanonicalProjects/charmed-openstack-upgrader/cou/apps/base.py", line 119, in __str__
    "units": {
  File "/home/jneo8/CanonicalProjects/charmed-openstack-upgrader/cou/apps/base.py", line 124, in <dictcomp>
    "o7k_version": str(self.get_latest_o7k_version(unit)),
  File "/home/jneo8/CanonicalProjects/charmed-openstack-upgrader/cou/apps/base.py", line 327, in get_latest_o7k_version
    compatible_o7k_versions = OpenStackCodenameLookup.find_compatible_versions(
  File "/home/jneo8/CanonicalProjects/charmed-openstack-upgrader/cou/utils/openstack.py", line 431, in find_compatible_versions
    if version in version_range:
  File "/home/jneo8/CanonicalProjects/charmed-openstack-upgrader/cou/utils/openstack.py", line 351, in __contains__
    service_version = Version(version)
  File "/home/jneo8/CanonicalProjects/charmed-openstack-upgrader/.venv/lib/python3.10/site-packages/packaging/version.py", line 202, in __init__
    raise InvalidVersion(f"Invalid version: '{version}'")
packaging.version.InvalidVersion: Invalid version: 'zed'
Call stack:
  File "/home/jneo8/CanonicalProjects/charmed-openstack-upgrader/.venv/bin/cou", line 33, in <module>
    sys.exit(load_entry_point('charmed-openstack-upgrader', 'console_scripts', 'cou')())
  File "/home/jneo8/CanonicalProjects/charmed-openstack-upgrader/cou/__main__.py", line 21, in main
    entrypoint()
  File "/home/jneo8/CanonicalProjects/charmed-openstack-upgrader/cou/cli.py", line 255, in entrypoint
    loop.run_until_complete(_run_command(args))
  File "/home/jneo8/.rye/py/cpython@3.10.13/install/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
    self.run_forever()
  File "/home/jneo8/.rye/py/cpython@3.10.13/install/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
    self._run_once()
  File "/home/jneo8/.rye/py/cpython@3.10.13/install/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
    handle._run()
  File "/home/jneo8/.rye/py/cpython@3.10.13/install/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/home/jneo8/CanonicalProjects/charmed-openstack-upgrader/cou/cli.py", line 242, in _run_command
    await run_upgrade_subcommand(args)
  File "/home/jneo8/CanonicalProjects/charmed-openstack-upgrader/cou/cli.py", line 228, in run_upgrade_subcommand
    cloud_upgrade_plan = await analyze_and_generate_plan(model, args)
  File "/home/jneo8/CanonicalProjects/charmed-openstack-upgrader/cou/cli.py", line 139, in analyze_and_generate_plan
    analysis_result = await Analysis.create(model, skip_apps=args.skip_apps)
  File "/home/jneo8/CanonicalProjects/charmed-openstack-upgrader/cou/steps/analyze.py", line 150, in create
    apps = await Analysis._populate(model)
  File "/home/jneo8/CanonicalProjects/charmed-openstack-upgrader/cou/steps/analyze.py", line 172, in _populate
    logger.info("Found %s application:\n%s", name, o7k_app)
Message: 'Found %s application:\n%s'
Arguments: ('nova-compute', nova-compute)
2024-09-30 14:59:32 [ERROR] Unexpected error occurred.
2024-09-30 14:59:32 [ERROR] Invalid version: 'zed'

This is due to the way how COU verify the cloud and generate the upgrade plan. In this case, me as an user need to manually run all the upgrade steps on every machine, which is not user friendly.

samuelallan72 commented 1 month ago

@jneo8 do you have any record of the juju debug-log or output of juju status? That would be helpful here. It seems that the workload version was "zed" for nova-compute somehow, instead of something like '29.0.1' which would be normal. Did the workload version change during the upgrade? I'm confused, because this looks more like a bug on the charm, not with COU.

jneo8 commented 1 month ago

No logs, sorry. The reason why this failed is because nova-compute upgrade failed on the openstack-upgrade action, so the charm is in new channel but the workload is still the old one.

samuelallan72 commented 1 month ago

so the charm is in new channel but the workload is still the old one.

This makes sense, but I don't understand how that relates to the error from COU though. :thinking: I guess we need to try to reproduce the error and go from there. :slightly_smiling_face:

jneo8 commented 1 month ago

It's about user experience. If you encounter error on one of the sub-step. Then COU won't be able to restart because it won't able to detect the current state. So user have to finish the upgrade manually to continue.

Pjack commented 2 weeks ago

A significant architectural change is needed. Since we won't be implementing it, I decide to close this ticket.

canonical / charmed-openstack-upgrader

[User Experience] If hypervisor upgrade failed and the user restarts COU, COU won't recognize that the user is in the middle of nova-compute upgrade. #561