canonical / charmed-openstack-upgrader

Automatic upgrade tool for Charmed Openstack
Apache License 2.0
4 stars 12 forks source link

[nova-compute / ovn-chassis] upgrade nova-compute failed during zed -> 2023.1 #494

Open chanchiwai-ray opened 1 month ago

chanchiwai-ray commented 1 month ago

When COU refresh nova-compute from `zed/stable' to '2023.1/stable' during Jammy/Zed -> Jammy/2023.1 upgrade, the following substeps of 'Upgrade plan for units: nova-compute/0' failed

Upgrade plan for unit 'nova-compute/0': ActionFailed('Run of action \'openstack-upgrade\' with parameters \'<not-set>\' on \'nova-compute/0\' failed with \'upgrade callback resulted in an unexpected error\' …)
unit-nova-compute-1: 03:50:34 WARNING unit.nova-compute/1.openstack-upgrade E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).
unit-nova-compute-1: 03:50:34 INFO unit.nova-compute/1.juju-log Couldn't acquire DPKG lock. Will retry in 10 seconds
unit-nova-compute-1: 03:50:44 DEBUG unit.nova-compute/1.openstack-upgrade Reading package lists...
unit-nova-compute-1: 03:50:44 DEBUG unit.nova-compute/1.openstack-upgrade Building dependency tree...
unit-nova-compute-1: 03:50:44 DEBUG unit.nova-compute/1.openstack-upgrade Reading state information...
unit-nova-compute-1: 03:50:44 DEBUG unit.nova-compute/1.openstack-upgrade You might want to run 'apt --fix-broken install' to correct these.
unit-nova-compute-1: 03:50:44 DEBUG unit.nova-compute/1.openstack-upgrade The following packages have unmet dependencies:
unit-nova-compute-1: 03:50:44 DEBUG unit.nova-compute/1.openstack-upgrade  openvswitch-switch : Depends: openvswitch-common (= 3.0.3-0ubuntu0.22.10.3~cloud3) but 3.1.3-0ubuntu0.23.04.1~cloud0 is installed
unit-nova-compute-1: 03:50:44 WARNING unit.nova-compute/1.openstack-upgrade E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).

Environment: deployed with stsstack-bundle using ./generate-bundle.sh --name cou -r yoga -s jammy --ceph --run

Failed: Upgrade the unit: 'nova-compute/0' substep failed

Note

samuelallan72 commented 1 month ago

This appears related to LP: #2068109 - same broken packages for apt, same workaround, and similarly the broken packages relate to a colocated charm.

Pjack commented 4 weeks ago

The workaround solution may break the configuration of nova-compute. We need to find some other solution.

jneo8 commented 3 weeks ago

Problem Description

Initially, I deployed an OpenStack Zed cloud with ovn-chassis channel 22.09/stable and nova-compute zed/stable. The upgrade path was: ovn-chassisovn-centralnova-compute. The ovn-chassis should be upgraded before ovn-central, and the OpenStack control plane should be upgraded before the hypervisor.

Upgrade Process

  1. ovn-chassis Upgrade:

    • Upgraded from 22.09/stable to 23.03/stable.
    • However, the workload version remained at Zed because /etc/apt/sources.list.d/cloud-archive.list still pointed to deb http://ubuntu-cloud.archive.canonical.com/ubuntu jammy-updates/zed main, keeping openvswitch-switch and ovn-common packages at 22.09.
  2. ovn-central Upgrade:

    • Upgraded on another node without issues.
  3. nova-compute Upgrade:

    • With action-managed-upgrade=True, the openstack-upgrade action changed /etc/apt/sources.list.d/cloud-archive.list to deb http://ubuntu-cloud.archive.canonical.com/ubuntu jammy-updates/antelope main. This upgraded ovn-common to 23.04, breaking the dependency on openvswitch-switch which was still at 22.09.
$ sudo apt-get --option Dpkg:Options::=--force-confnew --option Dpkg::Options::=--force-confdef dist-upgrade
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
You might want to run 'apt --fix-broken install' to correct these.
The following packages have unmet dependencies:
 openvswitch-switch : Depends: openvswitch-common (= 3.0.3-0ubuntu0.22.10.3~cloud3) but 3.1.3-0ubuntu0.23.04.1~cloud0 is installed
E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).

Conclusion

This presents a chicken-and-egg problem: the ovn-chassis must be upgraded before nova-compute, but the necessary OVN version isn't available until after the nova-compute upgrade.

The workaround can be running sudo DEBIAN_FRONTEND=noninteractive apt --fix-broken install -y -o Dpkg::Options::="--force-confold" -o Dpkg::Options::="--force-confdef" command after first time the upgrade action failed, which will update the ovn-chassis workload to 23.03.

fnordahl commented 3 weeks ago

Initially, I deployed an OpenStack Zed cloud with ovn-chassis channel 22.09/stable and nova-compute zed/stable. The upgrade path was: ovn-chassis → ovn-central → nova-compute. The ovn-chassis should be upgraded before ovn-central, and the OpenStack control plane should be upgraded before the hypervisor.

So far I'm with you.

This presents a chicken-and-egg problem: the ovn-chassis must be upgraded before nova-compute, but the necessary OVN version isn't available until after the nova-compute upgrade.

Why exactly does ovn-chassis need to be upgraded before nova-compute?

What prevents you from upgrading ovn-chassis and nova-compute together, leaving the ovn-central upgrade to the end when all chassis have been upgraded?

jneo8 commented 3 weeks ago

Hi @fnordahl , thanks for you response first.

Why exactly does ovn-chassis need to be upgraded before nova-compute?

So these are the reasons we have this order: ovn-chassis -> ovn-central -> nova-compute

What prevents you from upgrading ovn-chassis and nova-compute together, leaving the ovn-central upgrade to the end when all chassis have been upgraded?

The main reason cause this issue is because no matter the order of upgrading on the hypervisor node, it will have issue:

fnordahl commented 3 weeks ago

It is possible to upgrade the charm without upgrading its payload. Would that not solve the RuntimeError problem?

samuelallan72 commented 3 weeks ago

What prevents you from upgrading ovn-chassis and nova-compute together,

@fnordahl could you explain what this looks like practically - ie. if you were to do this manually? I'm curious because I'm not aware of any charm mechanism that synchronises updates between charms.

jneo8 commented 3 weeks ago

It is possible to upgrade the charm without upgrading its payload. Would that not solve the RuntimeError problem?

I'm almost sure it won't be possible to resolve the issue because the RuntimeError come from the charm-helpers don't have the OPENSTACK_CODENAME, for example missing zed in charm-helpers yoga release. If I understand correctly how charm works/packaged, it won't be possible to change it without refresh.

(Or I misunderstand what you try to do here)

fnordahl commented 3 weeks ago

As laid out in the documentation, it is expected to first upgrade charms and then upgrade the payload. There is nothing manual or special about this.

So if you upgrade the charms first and then proceed with the payload upgrade, is this an issue?

jneo8 commented 3 weeks ago

There is also an issue on work-around:

# run work-around command on nova-compute unit

sudo DEBIAN_FRONTEND=noninteractive apt --fix-broken install -y -o Dpkg::Options::="--force-confold" -o Dpkg::Options::="--force-confdef"

# on juju client

# This will success
juju run nova-compute/0 openstack-upgrade

# This will failed
juju run nova-compute/0 resume

# This will success
juju run nova-compute/0 enable

This error message from resume action:

Running operation 73 with 1 task
  - task 74 on unit-nova-compute-0

Waiting for task 74...

failed
inactive
inactive
Filesystem                                             1K-blocks   Used Available Use% Mounted on
/dev/mapper/crypt-143335ca-8be8-4124-a149-05156760c184  52386824 398292  51988532   1% /var/lib/nova/instances
Filesystem                                             1K-blocks   Used Available Use% Mounted on
/dev/mapper/crypt-143335ca-8be8-4124-a149-05156760c184  52386824 398292  51988532   1% /var/lib/nova/instances
Filesystem                                             1K-blocks   Used Available Use% Mounted on
/dev/mapper/crypt-143335ca-8be8-4124-a149-05156760c184  52386824 398292  51988532   1% /var/lib/nova/instances
Filesystem                                             1K-blocks   Used Available Use% Mounted on
/dev/mapper/crypt-143335ca-8be8-4124-a149-05156760c184  52386824 398292  51988532   1% /var/lib/nova/instances
active
active
active
Removed /etc/systemd/system/nova-compute.service.
Synchronizing state of nova-compute.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable nova-compute
Created symlink /etc/systemd/system/multi-user.target.wants/nova-compute.service → /lib/systemd/system/nova-compute.service.
Removed /etc/systemd/system/nova-api-metadata.service.
Synchronizing state of nova-api-metadata.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable nova-api-metadata
Created symlink /etc/systemd/system/multi-user.target.wants/nova-api-metadata.service → /lib/systemd/system/nova-api-metadata.service.
Removed /etc/systemd/system/qemu-kvm.service.
Created symlink /etc/systemd/system/multi-user.target.wants/qemu-kvm.service → /lib/systemd/system/qemu-kvm.service.
ERROR no relation id specified
jneo8 commented 3 weeks ago

As laid out in the documentation, it is expected to first upgrade charms and then upgrade the payload. There is nothing manual or special about this.

So if you upgrade the charms first and then proceed with the payload upgrade, is this an issue?

Hi @fnordahl

You mean for example we upgrade the all the charms without change any source or openstack-origin juju application configuration as first step then change the configuration or run upgrade action later?

fnordahl commented 3 weeks ago

You mean for example we upgrade the all the charms without change any source or openstack-origin juju application configuration as first step then change the configuration or run upgrade action later?

Yes.

fnordahl commented 3 weeks ago

@fnordahl could you explain what this looks like practically - ie. if you were to do this manually? I'm curious because I'm not aware of any charm mechanism that synchronises updates between charms.

On this topic there is precedence in the charms to have subordinates make their principal aware of their packages: https://review.opendev.org/q/topic:%22bug/1806111%22

Not sure if it is required here though, it might be that just doing charm upgrade before payload upgrade will make the nova-compute payload upgrade just make this work already.

javacruft commented 3 weeks ago

This is a packaging upgrade bug:

Preparing to unpack .../0-openvswitch-switch_3.1.3-0ubuntu0.23.04.1~cloud0_amd64.deb ...
Unpacking openvswitch-switch (3.1.3-0ubuntu0.23.04.1~cloud0) over (3.0.3-0ubuntu0.22.10.3~cloud3) ...
dpkg: error processing archive /tmp/apt-dpkg-install-e4dbGT/0-openvswitch-switch_3.1.3-0ubuntu0.23.04.1~cloud0_amd64.deb (--
unpack):
 trying to overwrite '/usr/share/openvswitch/local-config.ovsschema', which is also in package openvswitch-common 3.0.3-0ubu
ntu0.22.10.3~cloud3

openvswitch-switch needs a versioned Breaks/Replaces in the antelope UCA to deal with this file moving, if that was the intent.

javacruft commented 3 weeks ago

Due to the packaging history (merging with Debian) I think the openvswitch-switch binary package needs:

Breaks: openvswitch-common (<< 3.0.1-1-)
Replaces: openvswitch-common (<< 3.0.1-1-)

as that's when the local-config.ovsschema file was added to the -switch package.

@fnordahl are you OK to pickup a fix for this?

javacruft commented 3 weeks ago

Reproducer in a focal LXD container

root@literate-goshawk:~# history
    1  add-apt-repository cloud-archive:zed
    2  apt install openvswitch-switch
    3  add-apt-repository cloud-archive:antelope
    4  apt dist-upgrade --assume-yes
    5  history
samuelallan72 commented 3 weeks ago

As laid out in the documentation, it is expected to first upgrade charms and then upgrade the payload. There is nothing manual or special about this.

So if you upgrade the charms first and then proceed with the payload upgrade, is this an issue?

For "charms upgrade" though, it's not always clear what this refers to, as there are a couple of options:

  1. upgrade the charm to latest revision in the same channel - eg. juju refresh <charm>. Some parts of the documentation refer to the older method where all charms were on latest/stable, but for others it's the newer channel-based charms (eg. ussuri/stable channel), so this has different connotation for each.
  2. upgrade the charm to the next (target openstack) channel - eg. juju refresh <charm> --channel victoria/stable

Also how does this work for subordinate charms; it's not possible to split upgrading the charm and the payload like the principal charms, because they don't have a separate openstack-origin or source config.

fnordahl commented 3 weeks ago

Due to the packaging history (merging with Debian) I think the openvswitch-switch binary package needs:

Breaks: openvswitch-common (<< 3.0.1-1-)
Replaces: openvswitch-common (<< 3.0.1-1-)

as that's when the local-config.ovsschema file was added to the -switch package.

@fnordahl are you OK to pickup a fix for this?

@javacruft sure! Do you think this is a requirement all the way from tip of the packages, or is this a stable only package fix?

fnordahl commented 3 weeks ago

As laid out in the documentation, it is expected to first upgrade charms and then upgrade the payload. There is nothing manual or special about this. So if you upgrade the charms first and then proceed with the payload upgrade, is this an issue?

For "charms upgrade" though, it's not always clear what this refers to, as there are a couple of options:

1. upgrade the charm to latest revision in the same channel - eg. `juju refresh <charm>`.  Some parts of the documentation refer to the older method where all charms were on latest/stable, but for others it's the newer channel-based charms (eg. ussuri/stable channel), so this has different connotation for each.

2. upgrade the charm to the next (target openstack) channel - eg. `juju refresh <charm> --channel victoria/stable`

Regardless of which of these methods are used, the base reality is that the charm code needs to be upgraded prior to the payload for it to understand the new payload.

Also how does this work for subordinate charms; it's not possible to split upgrading the charm and the payload like the principal charms, because they don't have a separate openstack-origin or source config.

Indeed, subordinate charms typically do not deal with package upgrades for this reason. The chassis charm has a ovn-source configuration option solemnly to deal with the special Focal OVN 22.03 UCA pocket.

Some subordinate charms with package upgrade requirements exchange this information with their principal as mentioned in https://github.com/canonical/charmed-openstack-upgrader/issues/494#issuecomment-2273046410, but again I don't think explicit exchange is required for this relationship.

For ovn-chassis the openstack upgrade performed by the nova-compute should take care of the required package upgrades.

jneo8 commented 1 week ago

Issue on launchpad: https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/2077406