Azure / azure-linux-extensions

Linux Virtual Machine Extensions for Azure
Apache License 2.0
307 stars 254 forks source link

AADSSHLoginForLinux extension fails to install #1834

Open marchesir opened 11 months ago

marchesir commented 11 months ago

AADSSHLoginForLinux version: 1.0.2385.1 OS: Ubuntu 20.04LTS

hi,

we have a vmss managed by terraform and it install 3 extensions: AADSSHLoginForLinux CustomScript ApplicationHealthLinux

on provisioning of VMs AADSSHLoginForLinux fails with below error, this only started since Friday.

Multiple VM extensions failed to be provisioned on the VM. Please see the VM extension instance view for other failures. The first extension failed due to the error: VM 'xxxx_4' has not reported status for VM agent or extensions. Verify that the OS is up and healthy, the VM has a running VM agent, and that it can establish outbound connections to Azure storage. Please refer to https://aka.ms/vmextensionlinuxtroubleshoot for additional VM agent troubleshooting information.

1) why is this trying to reach out to storage? 2) the storage error has been there for long time and never stopped install until now: 2023-11-01T09:47:11.292363Z INFO ExtHandler ExtHandler Downloading artifacts profile blob 2023-11-01T09:47:42.420549Z WARNING ExtHandler ExtHandler Fetch failed: [HttpError] [HTTP Failed] GET https://md-hdd-d12hgqdtfwl4.z34.blob.storage.azure.net/$system/ghec-dev-runners-vmsstests_2.7bb722e2-74be-462b-ab64-dd83a1e36506.vmSettings -- IOError EOF occurred in violation of protocol (_ssl.c:1131) -- 6 attempts made 202 3) is there a way to pin a specific version of the extension or can an offline install in the base image be done? 4) if an offline install is possible how do we force the vm to not install latest on startup?

regards

yanchoyanev commented 10 months ago

The failure is caused by the fact that the installation starts at the time when HTTP stack is not quite ready. This is happening only on VMSS, not regular VMs. The VMSS and waagent teams are actively looking for the root cause and a possible fix. As a workaround, change the extension order if you have multiple extensions, or insert a script extension that just waits for a couple of seconds. Storage access - this is not coming from the extension; probably from the agent. It seems the same HTTP problem. There is no point to install older extension version because the problem is not in the extension itself but rather the time it is installed.

marchesir commented 10 months ago

great thanks for update. I actually fixed the issue as you suggested forcing the aadssh extension to run last.