Closed ubuntu-server-builder closed 1 year ago
Launchpad user Steve Langasek(vorlon) wrote on 2017-05-24T21:30:51+00:00
On Wed, May 24, 2017 at 09:10:37PM -0000, Jim Browne wrote:
While it would be better to solve this in apt itself, I suggest that cloud-init be defensive when calling apt and implement some retry mechanism.
I would suggest instead that cloud-init should declare itself Before=apt-daily.service / apt-daily.timer, so that cloud-init takes precedence over apt-daily on first boot.
Launchpad user Jim Browne(jbrowne) wrote on 2017-05-24T21:42:51.947736+00:00
My concern is another apt dependent task being added somewhere else in systemd that winds up triggering during boot. IMO it's better to be generically defensive about the use of apt, but others certainly have more context and information than I do.
Launchpad user Scott Moser(smoser) wrote on 2017-05-25T18:38:43.913791+00:00
I suspect that Steve's suggestion should fix this mostly for cloud-init. Apt does of course have a general locking problem that really does need addressing.
We've all seen workarounds/retries at all sorts of levels to address the problems that a.) you basically have to run 'apt-get update' before you run 'apt-get install' (bug 1429285), which results in the over-usage of that fairly heavy resource.
b.) if another process is running 'apt-get install' or 'apt-get remove' when you attempt, you will fail with the lock contention.
These things should be solved in apt, not worked around in yet another process that uses it.
Launchpad user Chris White(cwprogram) wrote on 2017-06-01T22:17:19.159801+00:00
Some research on this indicates:
/etc/init.d/rc
is set to run services in parallel via startpar
cloud-init
was added as part of the non-concurrent part of the file would this prevent the issue?aptdcon
along with aptd
appears to allow various apt-get
operations in a queue like system. Unfortunately I can't tell what happens when a standard apt-get
package install happens while aptd
is doing its thing. Not only that but it would increase dependencies on cloud images.Launchpad user Julian Andres Klode(juliank) wrote on 2017-06-13T00:20:52.840976+00:00
We eventually want wait locking in apt, but I don't think it really solves all issues, especially in scripts with multiple apt invocations. Which is why apt-daily got an additional flock lock for the upcoming SRUs. (see artful).
Feel free.to wait on the same.lock and probably add some ordering against apt-daily and apt-daily-upgrade services.
Launchpad user Launchpad Janitor(janitor) wrote on 2017-06-27T21:56:55.030674+00:00
This bug was fixed in the package cloud-init - 0.7.9-197-gebc9ecbc-0ubuntu1
cloud-init (0.7.9-197-gebc9ecbc-0ubuntu1) artful; urgency=medium
New upstream snapshot.
-- Scott Moser smoser@ubuntu.com Tue, 27 Jun 2017 17:18:24 -0400
Launchpad user Steve Langasek(vorlon) wrote on 2017-06-29T04:45:32.490088+00:00
Hello Jim, or anyone else affected,
Accepted cloud-init into zesty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/0.7.9-153-g16a7302f-0ubuntu1~17.04.2 in a few hours, and then in the -proposed repository.
Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-zesty to verification-done-zesty.If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-zesty. In either case, details of your testing will help us make a better decision.
Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!
Launchpad user Steve Langasek(vorlon) wrote on 2017-06-29T04:53:17.173086+00:00
Hello Jim, or anyone else affected,
Accepted cloud-init into yakkety-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/0.7.9-153-g16a7302f-0ubuntu1~16.10.2 in a few hours, and then in the -proposed repository.
Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-yakkety to verification-done-yakkety.If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-yakkety. In either case, details of your testing will help us make a better decision.
Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!
Launchpad user Steve Langasek(vorlon) wrote on 2017-06-29T04:55:56.401797+00:00
Hello Jim, or anyone else affected,
Accepted cloud-init into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/0.7.9-153-g16a7302f-0ubuntu1~16.04.2 in a few hours, and then in the -proposed repository.
Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial.If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, details of your testing will help us make a better decision.
Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!
Launchpad user Scott Moser(smoser) wrote on 2017-06-29T19:22:00.003970+00:00
$ for rel in xenial yakkety zesty; do lxc-proposed-snapshot --proposed $rel proposed-$rel --publish || break; done
$ for rel in xenial yakkety zesty; do lxc launch proposed-$rel "--config=user.user-data=$(cat config.yaml)" test-$rel || break; done
$ sleep 2m
$ for rel in xenial yakkety zesty; do mkdir $rel && ( cd $rel && lxc exec test-$rel -- journalctl -o short-precise > journal.log && lxc exec test-$rel -- dpkg-query --show cloud-init > cloud-init-dpkg.txt && lxc file pull test-$rel/var/log/cloud-init.log cloud-init.log && lxc file pull test-$rel/var/log/cloud-init-output.log cloud-init-output.log ) || break; done
$ for rel in xenial yakkety zesty; do tar -czf /tmp/1693361-$rel.tar.gz $rel; done
Launchpad attachments: xenial results
Launchpad user Scott Moser(smoser) wrote on 2017-06-29T19:22:19.634597+00:00
Launchpad attachments: yakkety results
Launchpad user Scott Moser(smoser) wrote on 2017-06-29T19:22:45.775471+00:00
Launchpad attachments: zesty results
Launchpad user Launchpad Janitor(janitor) wrote on 2017-06-29T21:48:59.868249+00:00
This bug was fixed in the package cloud-init - 0.7.9-153-g16a7302f-0ubuntu1~16.04.2
cloud-init (0.7.9-153-g16a7302f-0ubuntu1~16.04.2) xenial-proposed; urgency=medium
cherry-pick 11121fe4: systemd: make cloud-final.service run before apt daily (LP: #1693361)
-- Scott Moser smoser@ubuntu.com Wed, 28 Jun 2017 17:17:18 -0400
Launchpad user Steve Langasek(vorlon) wrote on 2017-06-29T21:49:11.119691+00:00
The verification of the Stable Release Update for cloud-init has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.
Launchpad user Launchpad Janitor(janitor) wrote on 2017-07-26T18:27:42.552325+00:00
This bug was fixed in the package cloud-init - 0.7.9-153-g16a7302f-0ubuntu1~17.04.2
cloud-init (0.7.9-153-g16a7302f-0ubuntu1~17.04.2) zesty-proposed; urgency=medium
cherry-pick 11121fe4: systemd: make cloud-final.service run before apt daily (LP: #1693361)
-- Scott Moser smoser@ubuntu.com Wed, 28 Jun 2017 17:20:51 -0400
Launchpad user Scott Moser(smoser) wrote on 2017-09-23T02:33:29.402324+00:00
This bug is believed to be fixed in cloud-init in 17.1. If this is still a problem for you, please make a comment and set the state back to New
Thank you.
Launchpad user Julian Andres Klode(juliank) wrote on 2018-08-22T10:54:05.607823+00:00
Nothing actionable here for apt, so I'll close this. We should consider making frontend locking more flexible for scripts using apt, though, so scripts can hold the lock all the time and drive apt.
Launchpad user David Reis(dryd) wrote on 2021-11-12T12:29:16.114889+00:00
This is not fixed, it just affected me on Ubuntu 20.04.3 LTS, resulting in the the subsequent server configuration failing completely because awscli and jq were missing.
Output:
Cloud-init v. 21.3-1-g6803368d-0ubuntu1~20.04.4 running 'modules:config' at Fri, 12 Nov 2021 11:05:29 +0000. Up 18.13 seconds. Get:1 http://eu-central-1.ec2.archive.ubuntu.com/ubuntu focal InRelease [265 kB] [... more Gets] E: Could not get lock /var/lib/dpkg/lock-frontend. It is held by process 2764 (unattended-upgr) E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it? Cloud-init v. 21.3-1-g6803368d-0ubuntu1~20.04.4 running 'modules:final' at Fri, 12 Nov 2021 11:05:30 +0000. Up 19.15 seconds. 2021-11-12 11:05:38,955 - util.py[WARNING]: Package upgrade failed E: Could not get lock /var/lib/dpkg/lock-frontend. It is held by process 2764 (unattended-upgr) E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it? 2021-11-12 11:05:38,999 - util.py[WARNING]: Failed to install packages: ['awscli', 'nmap', 'tcpdump', 'bind9utils', 'curl', 'wget', 'vim', 'jq', 'htop', 'tmux', 'git', 'iotop', 'iftop', 'fail2ban'] 2021-11-12 11:05:38,999 - cc_package_update_upgrade_install.py[WARNING]: 2 failed with exceptions, re-raising the last one 2021-11-12 11:05:39,000 - util.py[WARNING]: Running module package-update-upgrade-install (<module 'cloudinit.config.cc_package_update_upgrade_install' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_package_update_upgrade_install.py'>) failed
Note: Before=apt-daily.service is only set on cloud-final.service.
Launchpad user Julian Andres Klode(juliank) wrote on 2021-11-12T12:49:23.152055+00:00
Arguably it should run before apt-daily-upgrade too. apt-daily-upgrade is the one locking dpkg; apt-daily locks apt lists (and cache) directory.
Launchpad user David Reis(dryd) wrote on 2021-11-12T13:06:21.416123+00:00
Ah, thanks, I wasn't aware they're distinct. So would simply adding apt-daily-upgrade.service to the Before via cloud-init's bootcmd and then issuing a daemon-reload be a suitable workaround? There's a 30s window until the upgrade process starts if apt's history.log is to be trusted. That is probably enough to be somewhat reliable.
Launchpad user Koen Serneels(eskubu) wrote on 2021-12-06T13:49:34.819971+00:00
From cloud-init point of view the solution now implemented make sense: to run it before the apt-daily-upgrade. However, I wanted to add that there are other use cases as well such as SSM documents being executed on instances. These can be executed in batch at any time and may also require installation of packages and thus interfere with these unattended upgrades.
The execution of documents is not linked directly to cloud-init and may be ran after the instances has been booted, so this falls in the other category of having some kind queuing system or at least a centralized way to obtain a lock to be able to use apt. At the moment there are dozens of different possibilities how to get a mutex to be able to execute apt, but somehow we couldn't find a bullet proof way that works every time.
So maybe this does not really fit into this ticket, but to address that this is only a partial fix to a bigger problem.
Launchpad user James Falcon(falcojr) wrote on 2021-12-06T15:59:22.390185+00:00
Not sure if this helps, but we recently added behavior to wait for an apt lock when doing apt commands. This will be included in our next release: https://github.com/canonical/cloud-init/pull/1034
If there are still remaining issues, please open a new bug rather than commenting here. This bug won't be re-opened.
Launchpad user Julian Andres Klode(juliank) wrote on 2021-12-06T16:56:33.360031+00:00
Since 20.04, apt can wait for a lock.
The apt(8) command automatically waits for a lock for 120 seconds (non-interactive) or infinitely.
The apt-get(8) command can be configured to wait as well by passing the -o DPkg::Lock::Timeout=
This avoids any races you'd get by doing the lock yourself and then invoking apt.
Launchpad user Jesús Gómez(jgomo3) wrote on 2022-04-13T17:38:06.501520+00:00
2022 still happens on AWS Ubuntu 20.04.
But in my case, is 100% of the time, not sometimes.
This user-data:
#cloud-config
package_update: true
package_upgrade: true
packages:
- awscli
- jq
sudo cloud-init status
status: error
Logs collected and attached.
Launchpad attachments: cloud-init.tar.gz
This bug was originally filed in Launchpad as LP: #1693361
Launchpad details
Launchpad user Jim Browne(jbrowne) wrote on 2017-05-24T21:10:37.007863+00:00
=== Begin SRU Template === [Impact] A cloud-config that contains packages to install (see below) or 'package_upgrade' will run 'apt-get update'. That can sometimes fail as a result of contention with the apt-daily.service that updates that information.
Cloud-config showing the problem is just like:
$ cat my.yaml #cloud-config packages: ['hello']
[Test Case] lxc-proposed-snapshot is https://git.launchpad.net/~smoser/cloud-init/+git/sru-info/tree/bin/lxc-proposed-snapshot It publishes an image to lxd with proposed enabled and cloud-init upgraded.
a.) launch an instance with proposed version of cloud-init and some user-data. This is platform independent. The test case demonstrates lxd. $ printf "%s\n%s\n%s\n" "#cloud-config" "packages: ['hello']" \ "package_upgrade: true" > config.yaml $ release=xenial $ ref=proposed-$release $ ./lxc-proposed-snapshot --proposed --publish $release $ref;
b.) start the instance $ name=$release-1693361 $ lxc launch my-xenial "--config=user.user-data=$(cat config.yaml) $ sleep 1 $ lxc exec $name -- tail -f /var/log/cloud-init.log /var/log/cloud-init-output.log # watch this boot.
c.) Look for evidence of systemd failure journalctl -o short-precise | grep -i break journalctl -o short-precise | grep -i order
[Regression Potential] Regression chance here is low. Its possible that ordering loops could occur. When that does happen, journalctl will mention it. Unfortunately in such cases systemd somewhat randomly picks a service to kil so behavior is somewhat undefined.
[Other Info] Upstream commit at https://git.launchpad.net/cloud-init/commit/?id=11121fe4
=== End SRU Template ===
apt-daily is now a systemd service rather than being invoked by cron.daily. If one builds a custom AMI it is possible that the apt-daily.timer will fire during boot. This can fire at the same time cloud-init is running and if cloud-init loses the race the invocation of apt (e.g. use of "packages:" in the config) will fail.
There is a lot of discussion online about this change to apt-daily (e.g. unattended upgrades happening during business hours, delaying boot, etc.) and discussion of potential systemd changes regarding timers firing during boot (c.f. https://github.com/systemd/systemd/issues/5659).
While it would be better to solve this in apt itself, I suggest that cloud-init be defensive when calling apt and implement some retry mechanism.
Various instances of people running into this issue:
https://github.com/chef/bento/issues/609 https://clusterhq.atlassian.net/browse/FLOC-4486 https://github.com/boxcutter/ubuntu/issues/73 https://unix.stackexchange.com/questions/315502/how-to-disable-apt-daily-service-on-ubuntu-cloud-vm-image