Open carldjohnston opened 4 years ago
Update on this; my proposed solution doesn't appear to work reliably.
I believe that the walinuxagent
service needs to start after cloud-final
to be more robust. I'm achieving that with the following in cloud-init:
bootcmd:
- mkdir -p /etc/systemd/system/walinuxagent.service.d
- echo "[Unit]\nAfter=cloud-final.service" > /etc/systemd/system/walinuxagent.service.d/override.conf
- sed "s/After=multi-user.target//g" /lib/systemd/system/cloud-final.service > /etc/systemd/system/cloud-final.service
- systemctl daemon-reload
Thanks for opening this issue! We'll take a look. Cc // @anhvoms
Good workaround but this should be fix, or allow extension sequencing for customScript extension.
Still no progress? What a shame
Have this issue as-well. According to the readme this used to work so feels like a regression at some point (bold text for emphasis of previous behaviour): Is there a plan to fix this?
Provisioning.UseCloudInit Type: Boolean Default: n
This options enables / disables support for provisioning by means of cloud-init. When true ("y"), the agent will wait for cloud-init to complete before installing extensions and processing the latest goal state. Provisioning.Enabled must be disabled ("n") for this option to have an effect. Setting Provisioning.Enabled to true ("y") overrides this option and runs the built-in agent provisioning code.
Note: This configuration option has been removed and has no effect. waagent now auto-detects cloud-init as a provisioning agent (with an option to override with Provisioning.Agent).
Note: This configuration option has been removed and has no effect. waagent now auto-detects cloud-init as a provisioning agent (with an option to override with Provisioning.Agent).
"waagent now auto-detects cloud-init as a provisioning agent" and the effect is that they both will/can work together?
I'm still having this issue though.
Update on this; my proposed solution doesn't appear to work reliably.
I believe that the
walinuxagent
service needs to start aftercloud-final
to be more robust. I'm achieving that with the following in cloud-init:bootcmd: - mkdir -p /etc/systemd/system/walinuxagent.service.d - echo "[Unit]\nAfter=cloud-final.service" > /etc/systemd/system/walinuxagent.service.d/override.conf - sed "s/After=multi-user.target//g" /lib/systemd/system/cloud-final.service > /etc/systemd/system/cloud-final.service - systemctl daemon-reload
This is what I got when trying it:
Cloud-init v. 20.2 running 'modules:config' at Sun, 19 Jun 2022 17:07:16 +0000. Up 12.87 seconds.
Reading package lists...
E: Could not get lock /var/lib/apt/lists/lock - open (11: Resource temporarily unavailable)
E: Unable to lock directory /var/lib/apt/lists/
Cloud-init v. 20.2 running 'modules:final' at Sun, 19 Jun 2022 17:07:18 +0000. Up 14.18 seconds.
2022-06-19 17:07:18,376 - util.py[WARNING]: Package update failed
Just to confirmed that I've done it properly:
$ cat /etc/systemd/system/walinuxagent.service.d/override.conf
[Unit]
After=cloud-final.service
$ cat /etc/systemd/system/cloud-final.service
[Unit]
Description=Execute cloud user/final scripts
After=network-online.target cloud-config.service rc-local.service
Before=apt-daily.service
Wants=network-online.target cloud-config.service
[Service]
Type=oneshot
ExecStart=/usr/bin/cloud-init modules --mode=final
RemainAfterExit=yes
TimeoutSec=0
KillMode=process
TasksMax=infinity
# Output needs to appear in instance console output
StandardOutput=journal+console
[Install]
WantedBy=cloud-init.target
+1
It seems utterly bizarre to leave this broken for two and a half years: cloud-init
has been around for over a decade and is absolutely industry standard. To have an external agent that wades in without any reference to cloud-init
renders it unusable. There is a provided mechanism to check if cloud-init
is still running https://cloudinit.readthedocs.io/en/latest/reference/faq.html
I had this same problem when trying to create a scale-set for my Azure DevOps agent pipeline.
This is the command I'm using to create the scale-set.
azps group create `
--location westus3 `
--name vsagent
azps vmss create `
--name agentpool `
--resource-group vsagent `
--image 'Debian:debian-11-daily:11-gen2:latest' `
--priority Spot `
--vm-sku Standard_D2s_v3 `
--storage-sku StandardSSD_LRS `
--orchestration-mode Uniform `
--instance-count 0 `
--eviction-policy Delete `
--upgrade-policy-mode Manual `
--single-placement-group false `
--platform-fault-domain-count 1 `
--lb-sku Basic `
--load-balancer '' `
--vnet-name agentnet `
--os-disk-caching readonly `
--authentication-type SSH `
--generate-ssh-keys `
--ssh-key-values ~/.ssh/id_rsa.pub `
--admin-username vsagent
azps vmss extension set `
--vmss-name agentpool `
--resource-group vsagent `
--name CustomScript `
--version 2.0 `
--publisher Microsoft.Azure.Extensions `
--settings '{ "commandToExecute": " apt-get -o DPkg::Lock::Timeout=60 update && apt-get install git && apt-get autoclean " }'
It conflicts with the Microsoft.Azure.DevOps.Pipelines.Agent extension
{
"isPipelinesAgent": true,
"agentFolder": "/agent",
"agentDownloadUrl": "https://vstsagentpackage.azureedge.net/agent/3.224.1/vsts-agent-linux-x64-3.224.1.tar.gz",
"enableScriptDownloadUrl": "https://vstsagenttools.blob.core.windows.net/tools/ElasticPools/Linux/15/enableagent.sh"
}
This is still an issue.
having this issue as well. its crashed three VMs, all Debian and ubuntu. Going to try this out
+1
It seems utterly bizarre to leave this broken for two and a half years:
cloud-init
has been around for over a decade and is absolutely industry standard. To have an external agent that wades in without any reference tocloud-init
renders it unusable. There is a provided mechanism to check ifcloud-init
is still running https://cloudinit.readthedocs.io/en/latest/reference/faq.html
Indeed, the solution for that problem is basically just use cloud-init and do everything through it. Don't use WAAgent. I was trying to use WAagent to upgrade the packages, and it runs at the wrong time.
I'm using this instead https://learn.microsoft.com/en-us/azure/virtual-machines/user-data
azps vmss create `
--name agentpool `
--resource-group vsagent `
--image 'Debian:debian-11-daily:11-gen2:latest' `
--priority Spot `
--vm-sku Standard_D2s_v3 `
--storage-sku StandardSSD_LRS `
--orchestration-mode Uniform `
--instance-count 0 `
--eviction-policy Delete `
--upgrade-policy-mode Manual `
--single-placement-group false `
--platform-fault-domain-count 1 `
--lb-sku Basic `
--load-balancer '' `
--vnet-name agentnet `
--os-disk-caching readonly `
--authentication-type SSH `
--generate-ssh-keys `
--ssh-key-values ~/.ssh/id_rsa.pub `
--admin-username vsagent `
--user-data @"
#cloud-config
# docs: https://cloudinit.readthedocs.io/en/latest/reference/examples.html#additional-apt-configuration-and-repositories
package_update: true
packages: ['git']
"@
Still a problem...
Currently (On Canonical UbuntuServer 18.04-LTS) the WALinuxAgent service and cloud-init run in parallel on start-up causing issues with contention of the package repository.
I believe that WALinuxAgent should start after the cloud-config service to ensure that:
Distro and WALinuxAgent details:
Additional context My current situation when starting an Ubuntu 18.04 VM, with the Log Analytics extension is that:
My current solution is to modify the service behaviour by adding this override to
/etc/systemd/system/walinuxagent.service.d/override.conf
from withinbootcmd
using cloud-init: