canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.32k stars 927 forks source link

Upgrade LXD VM from jammy to noble stops during "do-release-upgrade" when updating the lxd-agent-loader #14033

Open toabctl opened 1 week ago

toabctl commented 1 week ago

ProblemType Bug

Date Tue Sep 3 16:14:22 2024

CurrentDesktop ubuntu:GNOME

ProcEnviron LANG=en_US.UTF-8 PATH=(custom, no user) SHELL=/usr/bin/zsh TERM=xterm-256color XDG_RUNTIME_DIR=

DistroRelease Ubuntu 24.10

Uname Linux 6.8.0-41-generic x86_64

Architecture amd64

Snap lxd 5.21.2-22f93f4 (5.21/stable)

SnapChanges no changes found

SnapConnections Interface Plug Slot Notes content lxd:ceph-conf - - content lxd:ovn-certificates - - content lxd:ovn-chassis - - lxd multipass:lxd lxd:lxd - lxd-support lxd:lxd-support :lxd-support - network lxd:network :network - network-bind lxd:network-bind :network-bind - system-observe lxd:system-observe :system-observe -

SnapInfo.lxd name: lxd summary: LXD - container and VM manager publisher: Canonical** store-url: https://snapcraft.io/lxd contact: https://github.com/canonical/lxd/issues license: AGPL-3.0 description: | LXD is a system container and virtual machine manager.

It offers a simple CLI and REST API to manage local or remote instances, uses an image based workflow and support for a variety of advanced features.

Images are available for all Ubuntu releases and architectures as well as for a wide number of other Linux distributions. Existing integrations with many deployment and operation tools, makes it work just like a public cloud, except everything is under your control.

LXD containers are lightweight, secure by default and a great alternative to virtual machines when running Linux on Linux.

LXD virtual machines are modern and secure, using UEFI and secure-boot by default and a great choice when a different kernel or operating system is needed.

With clustering, up to 50 LXD servers can be easily joined and managed together with the same tools and APIs and without needing any external dependencies.

Supported configuration options for the snap (snap set lxd [=...]):

- ceph.builtin: Use snap-specific Ceph configuration [default=false]
- ceph.external: Use the system's ceph tools (ignores ceph.builtin)
[default=false]
- criu.enable: Enable experimental live-migration support [default=false]
- daemon.debug: Increase logging to debug level [default=false]
- daemon.group: Set group of users that have full control over LXD
[default=lxd]
- daemon.user.group: Set group of users that have restricted LXD access
[default=lxd]
- daemon.preseed: Pass a YAML configuration to `lxd init` on initial
start
- daemon.syslog: Send LXD log events to syslog [default=false]
- daemon.verbose: Increase logging to verbose level [default=false]
- lvm.external: Use the system's LVM tools [default=false]
- lxcfs.pidfd: Start per-container process tracking [default=false]
- lxcfs.loadavg: Start tracking per-container load average
[default=false]
- lxcfs.cfs: Consider CPU shares for CPU usage [default=false]
- lxcfs.debug: Increase logging to debug level [default=false]
- openvswitch.builtin: Run a snap-specific OVS daemon [default=false]
- openvswitch.external: Use the system's OVS tools (ignores
openvswitch.builtin) [default=false]
- ovn.builtin: Use snap-specific OVN configuration [default=false]
- ui.enable: Enable the web interface [default=false]

For system-wide configuration of the CLI, place your configuration in /var/snap/lxd/common/global-conf/ (config.yml and servercerts) commands:

SnapInfo.core22 name: core22 summary: Snap runtime environment publisher: Canonical** store-url: https://snapcraft.io/core22 license: unset description: | Base snaps are a specific type of snap that include libraries and dependencies common to many applications. They provide a consistent and reliable execution environment for the snap packages that use them.

The core22 base snap provides a runtime environment based on Ubuntu 22.04 LTS (Jammy Jellyfish).

Other Ubuntu environment base snaps include:

SnapGitOwner canonical

SnapGitName lxd

CrashDB snap-github

NonfreeKernelModules zfs

InstallationMedia Ubuntu 24.04 LTS "Noble Numbat" - Release amd64 (20240424)

Tags oracular wayland-session

InstallationDate Installed on 2024-07-18 (47 days ago)

UpgradeStatus Upgraded to oracular on 2024-08-26 (8 days ago)

ProcVersionSignature Ubuntu 6.8.0-41.41-generic 6.8.12

ProcCpuinfoMinimal processor : 23 vendor_id : AuthenticAMD cpu family : 25 model : 97 model name : AMD Ryzen 9 7900 12-Core Processor stepping : 2 microcode : 0xa601206 cpu MHz : 2840.654 cache size : 1024 KB physical id : 0 siblings : 24 core id : 13 cpu cores : 12 apicid : 27 initial apicid : 27 fpu : yes fpu_exception : yes cpuid level : 16 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid overflow_recov succor smca fsrm flush_l1d bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass srso bogomips : 7386.17 TLB size : 3584 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]

ApportVersion 2.30.0-0ubuntu1

CasperMD5CheckResult pass

toabctl commented 1 week ago

steps to reproduce:

lxc launch ubuntu:jammy jammy-to-noble --vm
lxc shell jammy-to-noble
# now inside the VM
do-release-upgrade

[snipped]
Setting up libkeyutils1:amd64 (1.6.3-3build1) ...
Setting up lxd-agent-loader (0.7) ...
Error: websocket: close 1006 (abnormal closure): unexpected EOF

Now I'm out of the VM.

tomponline commented 1 week ago

@simondeziel where are we at with updating the lxd-installer to not restart on upgrade?

tomponline commented 1 week ago

@simondeziel also do you have an existing issue for this?

simondeziel commented 1 week ago

@toabctl I'm running your reproducer (thanks!) but it's not finished yet. IIRC, d-r-u spawns a screen or tmux session, if that's the case can you re-attach to it and have it continue where it disconnected you?

simondeziel commented 1 week ago

After the lxd-agent gets restarted causing a disconnect, waiting a little makes lxc shell work again. At that point, the upgrade process can be picked up with screen -r ubuntu-release-upgrade-screen-window.

toabctl commented 1 week ago

After the lxd-agent gets restarted causing a disconnect, waiting a little makes lxc shell work again. At that point, the upgrade process can be picked up with screen -r ubuntu-release-upgrade-screen-window.

Thanks for the answer. I did reconnect and did a dpkg configure -a afterwards which kicked me out of the session again. Then I did reconnect again and it worked. It's good to have that screen session from the do-release-upgrade, but the user experience is really bad. I do know how to work around it, but this is very likely not the case for a lot of people so this needs imo fixing.

tomponline commented 1 week ago

@toabctl yes it does need to be fixed so that lxd-agent doesnt restart when its package is upgraded.

@simondeziel is there a bug you're tracking for this? I cant see it at https://launchpad.net/ubuntu/+source/lxd-agent-loader

simondeziel commented 1 week ago

@toabctl indeed, reconnecting is a workaround at best. I'll open a bug and work on a fix ASAP but due to SRU delays, this will take some time to land into Noble unfortunately.

simondeziel commented 1 week ago

Here's the LP bug: https://bugs.launchpad.net/ubuntu/+source/lxd-agent-loader/+bug/2078936

tomponline commented 4 days ago

@simondeziel can we close this now?

simondeziel commented 4 days ago

@tomponline lxd-agent-loader version 0.7ubuntu0.1 has still not landed officially in Noble so I think that we should keep this one open for visibility.

tomponline commented 4 days ago

OK please close when it lands in Noble. Thanks