Closed ubuntu-server-builder closed 1 year ago
Launchpad user Martin Pitt(pitti) wrote on 2016-08-22T08:20:38.497647+00:00
Would it be possible to move package installation into a cloud-init-*.service that is "Type=idle", i. e. runs after booting is complete? This would avoid all these corner cases and breakage when trying to install/start things while booting is not complete yet.
Launchpad user Neil Wilson(neil-aldur) wrote on 2016-08-22T09:33:41.165586+00:00
Martin,
Probably worth noting that this impacts upon the configuration systems as well. I'm using the PostgreSQL puppet configuration system, and that will sit in a loop waiting for PostgresQL to come up before moving onto the next stage of the configuration.
So if you are using puppet within cloud-init, and cloud events delay the start event until the boot is complete, then the configurator that expects things to happen in sequence will break.
It looks to me that large chunks of cloud-init need to be moved so it runs after 'multi-user.target' has been reached, not just package installation.
Launchpad user Scott Moser(smoser) wrote on 2016-09-01T15:53:38.652938+00:00
An update to this, I think for the moment the plan is to move many of the config modules that run in 'config_modules' to 'final_modules' and to move final_modules to run as idle.
I dont love it, but it seems like the only actual path to package installation to work.
Launchpad user Ryan Harper(raharper) wrote on 2016-09-08T17:38:36.783617+00:00
It's worth mentioning the scope of the package upgrade/install issue related to systemd.
For packages like apache2 which do not use dependent systemd service files, those service packages install and start properly.
For packages with dependent service files, like postgresql (it has both a postgresql.service (a dummy) and systemd generator service which creates a postgresql@
W.r.t package upgrades, the issue is scoped to service packages in the image which have an update that's not in the current image and also require an update to systemd services.
Launchpad user Scott Moser(smoser) wrote on 2016-09-08T19:59:45.409623+00:00
I have a branch at https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+ref/bug/1576692 which: a.) moves cloud-init-final to be Type=idle b.) moves config modules such as package-update-upgrade-install to run in the final_modules
I've patched a lxc container with that, and then launched several instances. My experience is that out of 5 containers 2 or 3 of them will have a running postgres at the end (per systemctl status postgresql@9.5-main).
The user-data I'm providing is just:
packages: [postgresql] runcmd:
Then you can just look at /run/my-status.
To start a patched image what you can do is: n=y1 lxc init ubuntu-daily:yakkety $n "--config=user.user-data=$(cat my.user-data)" lxc-chroot $n -- sh -ec 'd=/tmp/my.deb; trap "rm -f $d" EXIT; cat > $d && dpkg -i $d' < "$deb" lxc start $n
Launchpad user Martin Pitt(pitti) wrote on 2016-09-09T07:20:53.176159+00:00
I cannot see failed containers in your cloud instance, nor reproduce the failure by starting new ones.
I have also created and run http://people.canonical.com/~pitti/tmp/psql-idle.sh on my laptop and your cloud instance for 40 iterations, and I couldn't reproduce a failure. This takes cloud-init out of the equation and just tests running apt install postgresq in a Type=idle unit (plus some glue around it to wait for booting and iterate). So I'm fairly sure that this approach works in principle -- but of course with more moving parts there's more that can go wrong.
Launchpad user Martin Pitt(pitti) wrote on 2016-09-09T07:24:55.213412+00:00
The bit that I have doubts about in https://git.launchpad.net/~smoser/cloud-init/commit/?h=bug/1576692&id=6a249689a179f is why "runcmd" still runs in cloud_config_modules -- it's arbitrary code which might (and often does) run package installs, so it should really live in cloud_final_modules as well?
Launchpad user Martin Pitt(pitti) wrote on 2016-09-09T11:16:22.510892+00:00
Scott and I debugged this further, and the best hint so far is bug 1620780. In Scott's local instances he gets "systemctl is-system-running" == "starting" with
JOB UNIT TYPE STATE
2 dev-sda2.device start running
postgresql-9.5.postinst calls "invoke-rc.d postgresql start", and since the system is not booted yet (according to is-system-running) it starts the postgresql.service wrapper job with --job-mode=ignore-dependencies, and thus it never starts the @9.5-main instance.
I suggest to handle this bit (which is LXD specific) in bug 1620780, and keep this bug for the cloud-init change.
Launchpad user Martin Pitt(pitti) wrote on 2016-09-09T12:07:26.487114+00:00
Type=idle not waiting for running *.device jobs is related, but not identical to bug 1620780. I filed that as bug 1621846. Those are both on the systemd side, and for the cases where bug 1620780 does not hit (and thus bug 1621846 does not happen), it has been demonstrated that moving these units after the boot process works in principle.
Launchpad user Martin Pitt(pitti) wrote on 2016-09-09T15:26:59.446899+00:00
Pre-weekend braindump: I've had success with modifying a xenial image like that:
/usr/sbin/invoke-rc.d:
if ! systemctl --quiet is-active default.target; then
sctl_args="--job-mode=ignore-dependencies"
fi
Add multi-user.target to cloud-{config,final}.service
Then this works:
lxc launch ubuntu-xenial-mod --config=user.user-data="$(printf "#cloud-config\npackages: [postgresql, samba, postfix]")" x1
Services start:
postfix.service loaded active running LSB: Postfix Mail Transport Agent postgresql.service loaded active exited PostgreSQL RDBMS postgresql@9.5-main.service loaded active running PostgreSQL Cluster 9.5-main samba-ad-dc.service loaded active exited LSB: start Samba daemons for the AD DC
and reboot works fine.
Launchpad user Martin Pitt(pitti) wrote on 2016-09-09T15:30:08.378736+00:00
Add multi-user.target to cloud-{config,final}.service
Sorry, I meant to add After=multi-user.target
Launchpad user Martin Pitt(pitti) wrote on 2016-09-09T16:36:03.409844+00:00
Change invoke-rc.d to check is-active multi-user.target instead of is-system-running, to match After=multi-user.target. Also, always use --no-block for reload as an additional line of defence for if-up.d/ scripts, as reload has never been synchronous.
Launchpad user Martin Pitt(pitti) wrote on 2016-09-09T16:38:49.082386+00:00
For invoke-rc.d: original commit https://anonscm.debian.org/cgit/collab-maint/sysvinit.git/commit/?id=38e2b9fca
Try to reproduce the hangs in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=777113 when completely removing the hack, then ensure that they go away again with is-active m-u.target
Launchpad user Launchpad Janitor(janitor) wrote on 2016-09-09T21:19:32.196841+00:00
This bug was fixed in the package cloud-init - 0.7.7-28-g34a26f7-0ubuntu1
cloud-init (0.7.7-28-g34a26f7-0ubuntu1) yakkety; urgency=medium
New upstream snapshot.
-- Scott Moser smoser@ubuntu.com Fri, 09 Sep 2016 16:01:13 -0400
Launchpad user Scott Moser(smoser) wrote on 2016-09-09T23:40:23.265038+00:00
Just a comment / status here. cloud-init is now running the modules that do package installation after multi-user.target. the plan is to change init-system-helpers as pitti described in comment 10. Until that is fixed, the problem isn't really fixed.
Launchpad user Martin Pitt(pitti) wrote on 2016-09-10T11:06:44.511553+00:00
@Scott: Oh, does that actually work with adding the After= to just cloud-final.service? I thought the thing that actually does the package installs is cloud-config.service, and this needed that After= as well?
Launchpad user Martin Pitt(pitti) wrote on 2016-09-12T07:43:43.958155+00:00
https://anonscm.debian.org/cgit/collab-maint/init-system-helpers.git/commit/?id=1460d6a02
I also committed https://anonscm.debian.org/cgit/collab-maint/init-system-helpers.git/commit/?id=9cfb6dfed to further robustify the behaviour, but it is not required to fix this bug.
Launchpad user Martin Pitt(pitti) wrote on 2016-09-12T09:01:50.896259+00:00
i-s-h SRU uploaded.
Launchpad user Scott Moser(smoser) wrote on 2016-09-12T13:13:49.245498+00:00
Martin, I moved all things that do package installation into final. The user could still manage to have some things run at 'config' point in boot that would install packages, but anything that does it in cloud-init directly is now part of final.
Launchpad user Scott Moser(smoser) wrote on 2016-09-12T20:51:23.312650+00:00
fixed in 0.7.8.
Launchpad user Dave Chiluk(chiluk) wrote on 2016-09-12T22:44:46.347059+00:00
@smoser
Did you commit your changes to the xenial cloud-init as well? I'm not sure where xenial images grab cloud init for themselves, but I assume out of the xenial archives. Am I missing something here?
Launchpad user Patricia Gaughen(gaughen) wrote on 2016-09-12T23:24:07.875363+00:00
We use what's in the archive for what we include in cloud images. So once cloud-init lands, it will makes it way to an image. I would expect that Scott working his way through the SRU process.
Launchpad user Launchpad Janitor(janitor) wrote on 2016-09-13T07:35:34.047382+00:00
This bug was fixed in the package init-system-helpers - 1.44
init-system-helpers (1.44) unstable; urgency=medium
invoke-rc.d, service: Check for multi-user.target instead of graphical.target. There is a curious bug which sometimes causes "systemctl is-active default.target" to say inactive until "show" or "status" gets called on the unit. This needs to be investigated. Until then, check for multi-user.target which by and large does the same job, but seems to work reliably.
-- Martin Pitt martin.pitt@ubuntu.com Mon, 12 Sep 2016 22:52:23 +0200
Launchpad user Chris J Arges(arges) wrote on 2016-09-13T20:14:53.164874+00:00
Hello Scott, or anyone else affected,
Accepted cloud-init into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/0.7.7-31-g65ace7b-0ubuntu1~16.04.1 in a few hours, and then in the -proposed repository.
Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.
Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!
Launchpad user Neil Wilson(neil-aldur) wrote on 2016-09-14T08:52:32.677031+00:00
Have we back ported the init-system-helpers changes to Xenial?
I'm only seeing 1.29ubuntu2 this morning.
Launchpad user Martin Pitt(pitti) wrote on 2016-09-14T09:11:45.665432+00:00
init-system-helpers is still sitting in the SRU queue and needs to be reviewed/accepted.
Launchpad user Andy Whitcroft(apw) wrote on 2016-09-14T09:38:53.197301+00:00
Hello Scott, or anyone else affected,
Accepted init-system-helpers into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/init-system-helpers/1.29ubuntu3 in a few hours, and then in the -proposed repository.
Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.
Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!
Launchpad user Neil Wilson(neil-aldur) wrote on 2016-09-14T14:37:00.717208+00:00
Added both cloud-ini t and init-system-helpers from proposed to the standard Xenial cloud image (com.ubuntu.cloud:released:download/com.ubuntu.cloud:server:16.04:amd64/20160907.1/disk1.img) on a suitably sized server.
Reset the cloud init with rm -rf /var/lib/cloud/instances/*, shutdown the server and snapshotted the image.
Rebuilt a new server from the snapshotted image using the previously failing postgresql user data and all is well. The new packages correct my problem - bug 1611973
Launchpad user Scott Moser(smoser) wrote on 2016-09-14T18:09:39.783038+00:00
Thank you Neil!
I've been going through my testing here, and found:
That will require us to get that fix in and through proposed or we will break Azure boot. Its fallout of the systemd ordering.
Launchpad user Martin Pitt(pitti) wrote on 2016-09-15T09:43:37.851755+00:00
I just filed bug 1623868 which is fallout from this change, so blocking this SRU for now.
Launchpad user Martin Pitt(pitti) wrote on 2016-09-15T14:29:28.769685+00:00
Hello Scott, or anyone else affected,
Accepted cloud-init into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/0.7.8-1-g3705bb5-0ubuntu1~16.04.1 in a few hours, and then in the -proposed repository.
Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.
Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!
Launchpad user Scott Moser(smoser) wrote on 2016-09-17T02:14:50.773103+00:00
verified with: printf "#cloud-config\npackages: [postgresql, samba, postfix]\n" > user-data n=x1 lxc launch ubuntu-daily:xenial $n sleep 10 lxc exec $n -- sh -c ' p=/etc/apt/sources.list.d/proposed.list echo deb http://archive.ubuntu.com/ubuntu $(lsb_release -sc)-proposed main > "$p" && apt-get update -q && apt-get -qy install cloud-init'
lxc file push - $n/etc/cloud/cloud.cfg.d/update.cfg < user-data
lxc exec $n -- sh -c ' cd /var/lib/cloud && for d in ; do [ "$d" = "seed" ] || rm -Rf "$d"; done rm -Rf /var/log/cloud-init'
lxc exec $n reboot lxc exec $n -- tail -f /var/log/cloud-init-output.log
Launchpad user Launchpad Janitor(janitor) wrote on 2016-09-22T17:34:08.065412+00:00
This bug was fixed in the package cloud-init - 0.7.8-1-g3705bb5-0ubuntu1~16.04.1
cloud-init (0.7.8-1-g3705bb5-0ubuntu1~16.04.1) xenial-proposed; urgency=medium
cloud-init (0.7.7-31-g65ace7b-0ubuntu1~16.04.2) xenial-proposed; urgency=medium
cloud-init (0.7.7-31-g65ace7b-0ubuntu1~16.04.1) xenial-proposed; urgency=medium
New upstream snapshot.
-- Scott Moser smoser@ubuntu.com Thu, 15 Sep 2016 09:57:27 -0400
Launchpad user Chris J Arges(arges) wrote on 2016-09-22T17:34:52.080664+00:00
The verification of the Stable Release Update for cloud-init has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.
Launchpad user Launchpad Janitor(janitor) wrote on 2016-10-10T07:26:36.422481+00:00
This bug was fixed in the package init-system-helpers - 1.29ubuntu3
init-system-helpers (1.29ubuntu3) xenial-proposed; urgency=medium
invoke-rc.d, service: Only ignore systemd unit dependencies before multi-user.target. "systemctl is-system-running" might still be false in case of running jobs for device/mount/hotplug/dynamic actions units. But in those cases we already do want to respect unit dependencies, as the system is booted up sufficiently to avoid dependency loops. Thus weaken the condition to "multi-user.target is active".
This does not change the behaviour for single-user: is-system-running has always been false there, so dependencies continue to be ignored.
Fixes installation of packages like PostgreSQL under cloud-init or when manually installing packages right after booting.
LP: #1576692
-- Martin Pitt martin.pitt@ubuntu.com Mon, 12 Sep 2016 10:57:57 +0200
This bug was originally filed in Launchpad as LP: #1576692
Launchpad details
Launchpad user Scott Moser(smoser) wrote on 2016-04-29T13:44:17.658472+00:00
in cloud-init users can install packages via cloud-config:
cloud-config
packages: [apache2]
Due to some intricacies of systemd and service installation that doesn't work all that well. We fixed the issue for simple services that do not have any dependencies on other services, or at least don't check those dependencies well under bug 1575572.
We'd like to have a way to fully support this in cloud-init.
Related bugs: bug 1575572: apache2 fails to start if installed via cloud config (on Xenial) bug 1611973: postgresql@9.5-main service not started if postgres installed via cloud-init bug 1621336: snapd.boot-ok.service hangs eternally on cloud image upgrades (snapd packaging bug, but this cloud-init fix will workaround it) bug 1620780: dev-sda2.device job running and times out bug 1623570: Azure: cannot start walinux agent (Transaction order is cyclic.) bug 1623868: cloud-final.service does not run due to dependency cycle * bug 1627436: [gce] Startup scripts do not run on 1604 images
SRU INFORMATION
FIX for init-system-helpers: https://anonscm.debian.org/cgit/collab-maint/init-system-helpers.git/commit/?id=1460d6a02
REGRESSION POTENTIAL for init-system-helpers: This changes invoke-rc.d and service, two very central pieces of packaging infrastructure. Errors in it will break installation/upgrades of packages or /etc/network/if-up.d/ hooks and the like. This changes the condition when systemd units get started without their dependencies, and the condition gets weakened. This means that behaviour in a booted system is unchanged, but during boot this could change the behaviour of if-up.d/ hooks (although they have never been defined well during boot anyway). However, I tested this change extensively in cloud images and desktop installations (particularly I recreated https://bugs.debian.org/777113 and confirmed that this approach also fixes it) and could not find any regression.
TEST CASE (for both packages): Run lxc launch ubuntu-daily:x --config=user.user-data="$(printf "#cloud-config\npackages: [postgresql, samba, postfix]")" x1
This will install all three packages, but "systemctl status postgresql@9.5-main" will not be running.
Now prepare a new image with the proposed cloud-init and init-system-helpers:
lxc launch ubuntu-daily:x xprep lxc exec xprep bash # enable -proposed and dist-upgrade, then poweroff lxc publish xprep x-proposed
Now run the initial lxc launch again, but against that new x-proposed image instead of the standard daily:
lxc launch x-proposed --config=user.user-data="$(printf "#cloud-config\npackages: [postgresql, samba, postfix]")" x1
You should now have "systemctl status postgresql@9.5-main" running. Directly after rebooting the instance, check that there are no hanging jobs (systemctl list-jobs), particularly networking.service, to ensure that https://bugs.debian.org/777113 did not come back.
Also test interactively installing a package that ships a service, like "apache2", and verify that it starts properly after installation.
Verify that journalctl shows no dependency cycles and that all cloud init services and the target are active:
$ systemctl list-units --no-legend --all 'cloud*' cloud-config.service loaded active exited Apply the settings specified in cloud-config cloud-final.service loaded active exited Execute cloud user/final scripts cloud-init-local.service loaded active exited Initial cloud-init job (pre-networking) cloud-init.service loaded active exited Initial cloud-init job (metadata service crawler) cloud-config.target loaded active active Cloud-config availability cloud-init.target loaded active active Cloud-init target