canonical / cloud-init

Official upstream for the cloud-init: cloud instance initialization
https://cloud-init.io/
Other
2.99k stars 882 forks source link

fully support package installation in systemd #2662

Closed ubuntu-server-builder closed 1 year ago

ubuntu-server-builder commented 1 year ago

This bug was originally filed in Launchpad as LP: #1576692

Launchpad details
affected_projects = ['cloud-init (Ubuntu)', 'init-system-helpers (Ubuntu)', 'cloud-init (Ubuntu Xenial)', 'init-system-helpers (Ubuntu Xenial)']
assignee = None
assignee_name = None
date_closed = 2016-09-12T20:51:27.011995+00:00
date_created = 2016-04-29T13:44:17.658472+00:00
date_fix_committed = 2016-09-12T20:51:27.011995+00:00
date_fix_released = 2016-09-12T20:51:27.011995+00:00
id = 1576692
importance = critical
is_complete = True
lp_url = https://bugs.launchpad.net/cloud-init/+bug/1576692
milestone = None
owner = smoser
owner_name = Scott Moser
private = False
status = fix_released
submitter = smoser
submitter_name = Scott Moser
tags = ['sts', 'verification-done']
duplicates = [1510549, 1611973, 1626313]

Launchpad user Scott Moser(smoser) wrote on 2016-04-29T13:44:17.658472+00:00

in cloud-init users can install packages via cloud-config:

cloud-config

packages: [apache2]

Due to some intricacies of systemd and service installation that doesn't work all that well. We fixed the issue for simple services that do not have any dependencies on other services, or at least don't check those dependencies well under bug 1575572.

We'd like to have a way to fully support this in cloud-init.

Related bugs:   bug 1575572: apache2 fails to start if installed via cloud config (on Xenial)   bug 1611973: postgresql@9.5-main service not started if postgres installed via cloud-init   bug 1621336: snapd.boot-ok.service hangs eternally on cloud image upgrades (snapd packaging bug, but this cloud-init fix will workaround it)   bug 1620780: dev-sda2.device job running and times out   bug 1623570: Azure: cannot start walinux agent (Transaction order is cyclic.)   bug 1623868: cloud-final.service does not run due to dependency cycle  * bug 1627436: [gce] Startup scripts do not run on 1604 images

SRU INFORMATION

FIX for init-system-helpers: https://anonscm.debian.org/cgit/collab-maint/init-system-helpers.git/commit/?id=1460d6a02

REGRESSION POTENTIAL for init-system-helpers: This changes invoke-rc.d and service, two very central pieces of packaging infrastructure. Errors in it will break installation/upgrades of packages or /etc/network/if-up.d/ hooks and the like. This changes the condition when systemd units get started without their dependencies, and the condition gets weakened. This means that behaviour in a booted system is unchanged, but during boot this could change the behaviour of if-up.d/ hooks (although they have never been defined well during boot anyway). However, I tested this change extensively in cloud images and desktop installations (particularly I recreated https://bugs.debian.org/777113 and confirmed that this approach also fixes it) and could not find any regression.

TEST CASE (for both packages): Run    lxc launch ubuntu-daily:x --config=user.user-data="$(printf "#cloud-config\npackages: [postgresql, samba, postfix]")" x1

This will install all three packages, but "systemctl status postgresql@9.5-main" will not be running.

Now prepare a new image with the proposed cloud-init and init-system-helpers:

   lxc launch ubuntu-daily:x xprep    lxc exec xprep bash    # enable -proposed and dist-upgrade, then poweroff    lxc publish xprep x-proposed

Now run the initial lxc launch again, but against that new x-proposed image instead of the standard daily:

  lxc launch x-proposed --config=user.user-data="$(printf "#cloud-config\npackages: [postgresql, samba, postfix]")" x1

You should now have "systemctl status postgresql@9.5-main" running. Directly after rebooting the instance, check that there are no hanging jobs (systemctl list-jobs), particularly networking.service, to ensure that https://bugs.debian.org/777113 did not come back.

Also test interactively installing a package that ships a service, like "apache2", and verify that it starts properly after installation.

Verify that journalctl shows no dependency cycles and that all cloud init services and the target are active:

$ systemctl list-units --no-legend --all 'cloud*' cloud-config.service loaded active exited Apply the settings specified in cloud-config cloud-final.service loaded active exited Execute cloud user/final scripts cloud-init-local.service loaded active exited Initial cloud-init job (pre-networking) cloud-init.service loaded active exited Initial cloud-init job (metadata service crawler) cloud-config.target loaded active active Cloud-config availability cloud-init.target loaded active active Cloud-init target

ubuntu-server-builder commented 1 year ago

Launchpad user Martin Pitt(pitti) wrote on 2016-08-22T08:20:38.497647+00:00

Would it be possible to move package installation into a cloud-init-*.service that is "Type=idle", i. e. runs after booting is complete? This would avoid all these corner cases and breakage when trying to install/start things while booting is not complete yet.

ubuntu-server-builder commented 1 year ago

Launchpad user Neil Wilson(neil-aldur) wrote on 2016-08-22T09:33:41.165586+00:00

Martin,

Probably worth noting that this impacts upon the configuration systems as well. I'm using the PostgreSQL puppet configuration system, and that will sit in a loop waiting for PostgresQL to come up before moving onto the next stage of the configuration.

So if you are using puppet within cloud-init, and cloud events delay the start event until the boot is complete, then the configurator that expects things to happen in sequence will break.

It looks to me that large chunks of cloud-init need to be moved so it runs after 'multi-user.target' has been reached, not just package installation.

ubuntu-server-builder commented 1 year ago

Launchpad user Scott Moser(smoser) wrote on 2016-09-01T15:53:38.652938+00:00

An update to this, I think for the moment the plan is to move many of the config modules that run in 'config_modules' to 'final_modules' and to move final_modules to run as idle.

I dont love it, but it seems like the only actual path to package installation to work.

ubuntu-server-builder commented 1 year ago

Launchpad user Ryan Harper(raharper) wrote on 2016-09-08T17:38:36.783617+00:00

It's worth mentioning the scope of the package upgrade/install issue related to systemd.

For packages like apache2 which do not use dependent systemd service files, those service packages install and start properly.

For packages with dependent service files, like postgresql (it has both a postgresql.service (a dummy) and systemd generator service which creates a postgresql@- service; they currently do not start automatically due to being installed during cloud-init's cloud-config.service unit execution.

W.r.t package upgrades, the issue is scoped to service packages in the image which have an update that's not in the current image and also require an update to systemd services.

ubuntu-server-builder commented 1 year ago

Launchpad user Scott Moser(smoser) wrote on 2016-09-08T19:59:45.409623+00:00

I have a branch at https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+ref/bug/1576692 which: a.) moves cloud-init-final to be Type=idle b.) moves config modules such as package-update-upgrade-install to run in the final_modules

I've patched a lxc container with that, and then launched several instances. My experience is that out of 5 containers 2 or 3 of them will have a running postgres at the end (per systemctl status postgresql@9.5-main).

The user-data I'm providing is just:

cloud-config

packages: [postgresql] runcmd:

Then you can just look at /run/my-status.

To start a patched image what you can do is: n=y1 lxc init ubuntu-daily:yakkety $n "--config=user.user-data=$(cat my.user-data)" lxc-chroot $n -- sh -ec 'd=/tmp/my.deb; trap "rm -f $d" EXIT; cat > $d && dpkg -i $d' < "$deb" lxc start $n

ubuntu-server-builder commented 1 year ago

Launchpad user Martin Pitt(pitti) wrote on 2016-09-09T07:20:53.176159+00:00

I cannot see failed containers in your cloud instance, nor reproduce the failure by starting new ones.

I have also created and run http://people.canonical.com/~pitti/tmp/psql-idle.sh on my laptop and your cloud instance for 40 iterations, and I couldn't reproduce a failure. This takes cloud-init out of the equation and just tests running apt install postgresq in a Type=idle unit (plus some glue around it to wait for booting and iterate). So I'm fairly sure that this approach works in principle -- but of course with more moving parts there's more that can go wrong.

ubuntu-server-builder commented 1 year ago

Launchpad user Martin Pitt(pitti) wrote on 2016-09-09T07:24:55.213412+00:00

The bit that I have doubts about in https://git.launchpad.net/~smoser/cloud-init/commit/?h=bug/1576692&id=6a249689a179f is why "runcmd" still runs in cloud_config_modules -- it's arbitrary code which might (and often does) run package installs, so it should really live in cloud_final_modules as well?

ubuntu-server-builder commented 1 year ago

Launchpad user Martin Pitt(pitti) wrote on 2016-09-09T11:16:22.510892+00:00

Scott and I debugged this further, and the best hint so far is bug 1620780. In Scott's local instances he gets "systemctl is-system-running" == "starting" with

JOB UNIT TYPE STATE
2 dev-sda2.device start running

postgresql-9.5.postinst calls "invoke-rc.d postgresql start", and since the system is not booted yet (according to is-system-running) it starts the postgresql.service wrapper job with --job-mode=ignore-dependencies, and thus it never starts the @9.5-main instance.

I suggest to handle this bit (which is LXD specific) in bug 1620780, and keep this bug for the cloud-init change.

ubuntu-server-builder commented 1 year ago

Launchpad user Martin Pitt(pitti) wrote on 2016-09-09T12:07:26.487114+00:00

Type=idle not waiting for running *.device jobs is related, but not identical to bug 1620780. I filed that as bug 1621846. Those are both on the systemd side, and for the cases where bug 1620780 does not hit (and thus bug 1621846 does not happen), it has been demonstrated that moving these units after the boot process works in principle.

ubuntu-server-builder commented 1 year ago

Launchpad user Martin Pitt(pitti) wrote on 2016-09-09T15:26:59.446899+00:00

Pre-weekend braindump: I've had success with modifying a xenial image like that:

/usr/sbin/invoke-rc.d:

           if ! systemctl --quiet is-active default.target; then
                sctl_args="--job-mode=ignore-dependencies"
           fi

Add multi-user.target to cloud-{config,final}.service

Then this works:

lxc launch ubuntu-xenial-mod --config=user.user-data="$(printf "#cloud-config\npackages: [postgresql, samba, postfix]")" x1

Services start:

systemctl list-units --no-legend postg samb postfix*

postfix.service loaded active running LSB: Postfix Mail Transport Agent postgresql.service loaded active exited PostgreSQL RDBMS postgresql@9.5-main.service loaded active running PostgreSQL Cluster 9.5-main samba-ad-dc.service loaded active exited LSB: start Samba daemons for the AD DC

and reboot works fine.

ubuntu-server-builder commented 1 year ago

Launchpad user Martin Pitt(pitti) wrote on 2016-09-09T15:30:08.378736+00:00

Add multi-user.target to cloud-{config,final}.service

Sorry, I meant to add After=multi-user.target

ubuntu-server-builder commented 1 year ago

Launchpad user Martin Pitt(pitti) wrote on 2016-09-09T16:36:03.409844+00:00

Change invoke-rc.d to check is-active multi-user.target instead of is-system-running, to match After=multi-user.target. Also, always use --no-block for reload as an additional line of defence for if-up.d/ scripts, as reload has never been synchronous.

ubuntu-server-builder commented 1 year ago

Launchpad user Martin Pitt(pitti) wrote on 2016-09-09T16:38:49.082386+00:00

For invoke-rc.d: original commit https://anonscm.debian.org/cgit/collab-maint/sysvinit.git/commit/?id=38e2b9fca

Try to reproduce the hangs in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=777113 when completely removing the hack, then ensure that they go away again with is-active m-u.target

ubuntu-server-builder commented 1 year ago

Launchpad user Launchpad Janitor(janitor) wrote on 2016-09-09T21:19:32.196841+00:00

This bug was fixed in the package cloud-init - 0.7.7-28-g34a26f7-0ubuntu1


cloud-init (0.7.7-28-g34a26f7-0ubuntu1) yakkety; urgency=medium

ubuntu-server-builder commented 1 year ago

Launchpad user Scott Moser(smoser) wrote on 2016-09-09T23:40:23.265038+00:00

Just a comment / status here. cloud-init is now running the modules that do package installation after multi-user.target. the plan is to change init-system-helpers as pitti described in comment 10. Until that is fixed, the problem isn't really fixed.

ubuntu-server-builder commented 1 year ago

Launchpad user Martin Pitt(pitti) wrote on 2016-09-10T11:06:44.511553+00:00

@Scott: Oh, does that actually work with adding the After= to just cloud-final.service? I thought the thing that actually does the package installs is cloud-config.service, and this needed that After= as well?

ubuntu-server-builder commented 1 year ago

Launchpad user Martin Pitt(pitti) wrote on 2016-09-12T07:43:43.958155+00:00

https://anonscm.debian.org/cgit/collab-maint/init-system-helpers.git/commit/?id=1460d6a02

I also committed https://anonscm.debian.org/cgit/collab-maint/init-system-helpers.git/commit/?id=9cfb6dfed to further robustify the behaviour, but it is not required to fix this bug.

ubuntu-server-builder commented 1 year ago

Launchpad user Martin Pitt(pitti) wrote on 2016-09-12T09:01:50.896259+00:00

i-s-h SRU uploaded.

ubuntu-server-builder commented 1 year ago

Launchpad user Scott Moser(smoser) wrote on 2016-09-12T13:13:49.245498+00:00

Martin, I moved all things that do package installation into final. The user could still manage to have some things run at 'config' point in boot that would install packages, but anything that does it in cloud-init directly is now part of final.

ubuntu-server-builder commented 1 year ago

Launchpad user Scott Moser(smoser) wrote on 2016-09-12T20:51:23.312650+00:00

fixed in 0.7.8.

ubuntu-server-builder commented 1 year ago

Launchpad user Dave Chiluk(chiluk) wrote on 2016-09-12T22:44:46.347059+00:00

@smoser

Did you commit your changes to the xenial cloud-init as well? I'm not sure where xenial images grab cloud init for themselves, but I assume out of the xenial archives. Am I missing something here?

ubuntu-server-builder commented 1 year ago

Launchpad user Patricia Gaughen(gaughen) wrote on 2016-09-12T23:24:07.875363+00:00

We use what's in the archive for what we include in cloud images. So once cloud-init lands, it will makes it way to an image. I would expect that Scott working his way through the SRU process.

ubuntu-server-builder commented 1 year ago

Launchpad user Launchpad Janitor(janitor) wrote on 2016-09-13T07:35:34.047382+00:00

This bug was fixed in the package init-system-helpers - 1.44


init-system-helpers (1.44) unstable; urgency=medium

ubuntu-server-builder commented 1 year ago

Launchpad user Chris J Arges(arges) wrote on 2016-09-13T20:14:53.164874+00:00

Hello Scott, or anyone else affected,

Accepted cloud-init into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/0.7.7-31-g65ace7b-0ubuntu1~16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

ubuntu-server-builder commented 1 year ago

Launchpad user Neil Wilson(neil-aldur) wrote on 2016-09-14T08:52:32.677031+00:00

Have we back ported the init-system-helpers changes to Xenial?

I'm only seeing 1.29ubuntu2 this morning.

ubuntu-server-builder commented 1 year ago

Launchpad user Martin Pitt(pitti) wrote on 2016-09-14T09:11:45.665432+00:00

init-system-helpers is still sitting in the SRU queue and needs to be reviewed/accepted.

ubuntu-server-builder commented 1 year ago

Launchpad user Andy Whitcroft(apw) wrote on 2016-09-14T09:38:53.197301+00:00

Hello Scott, or anyone else affected,

Accepted init-system-helpers into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/init-system-helpers/1.29ubuntu3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

ubuntu-server-builder commented 1 year ago

Launchpad user Neil Wilson(neil-aldur) wrote on 2016-09-14T14:37:00.717208+00:00

Added both cloud-ini t and init-system-helpers from proposed to the standard Xenial cloud image (com.ubuntu.cloud:released:download/com.ubuntu.cloud:server:16.04:amd64/20160907.1/disk1.img) on a suitably sized server.

Reset the cloud init with rm -rf /var/lib/cloud/instances/*, shutdown the server and snapshotted the image.

Rebuilt a new server from the snapshotted image using the previously failing postgresql user data and all is well. The new packages correct my problem - bug 1611973

ubuntu-server-builder commented 1 year ago

Launchpad user Scott Moser(smoser) wrote on 2016-09-14T18:09:39.783038+00:00

Thank you Neil!

I've been going through my testing here, and found:

That will require us to get that fix in and through proposed or we will break Azure boot. Its fallout of the systemd ordering.

ubuntu-server-builder commented 1 year ago

Launchpad user Martin Pitt(pitti) wrote on 2016-09-15T09:43:37.851755+00:00

I just filed bug 1623868 which is fallout from this change, so blocking this SRU for now.

ubuntu-server-builder commented 1 year ago

Launchpad user Martin Pitt(pitti) wrote on 2016-09-15T14:29:28.769685+00:00

Hello Scott, or anyone else affected,

Accepted cloud-init into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/0.7.8-1-g3705bb5-0ubuntu1~16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

ubuntu-server-builder commented 1 year ago

Launchpad user Scott Moser(smoser) wrote on 2016-09-17T02:14:50.773103+00:00

verified with: printf "#cloud-config\npackages: [postgresql, samba, postfix]\n" > user-data n=x1 lxc launch ubuntu-daily:xenial $n sleep 10 lxc exec $n -- sh -c ' p=/etc/apt/sources.list.d/proposed.list echo deb http://archive.ubuntu.com/ubuntu $(lsb_release -sc)-proposed main > "$p" && apt-get update -q && apt-get -qy install cloud-init'

lxc file push - $n/etc/cloud/cloud.cfg.d/update.cfg < user-data

clean it out so next is first boot.

lxc exec $n -- sh -c ' cd /var/lib/cloud && for d in ; do [ "$d" = "seed" ] || rm -Rf "$d"; done rm -Rf /var/log/cloud-init'

lxc exec $n reboot lxc exec $n -- tail -f /var/log/cloud-init-output.log

ubuntu-server-builder commented 1 year ago

Launchpad user Launchpad Janitor(janitor) wrote on 2016-09-22T17:34:08.065412+00:00

This bug was fixed in the package cloud-init - 0.7.8-1-g3705bb5-0ubuntu1~16.04.1


cloud-init (0.7.8-1-g3705bb5-0ubuntu1~16.04.1) xenial-proposed; urgency=medium

cloud-init (0.7.7-31-g65ace7b-0ubuntu1~16.04.2) xenial-proposed; urgency=medium

cloud-init (0.7.7-31-g65ace7b-0ubuntu1~16.04.1) xenial-proposed; urgency=medium

ubuntu-server-builder commented 1 year ago

Launchpad user Chris J Arges(arges) wrote on 2016-09-22T17:34:52.080664+00:00

The verification of the Stable Release Update for cloud-init has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

ubuntu-server-builder commented 1 year ago

Launchpad user Launchpad Janitor(janitor) wrote on 2016-10-10T07:26:36.422481+00:00

This bug was fixed in the package init-system-helpers - 1.29ubuntu3


init-system-helpers (1.29ubuntu3) xenial-proposed; urgency=medium