Azure / WALinuxAgent

Microsoft Azure Linux Guest Agent
http://azure.microsoft.com/
Apache License 2.0
533 stars 371 forks source link

[BUG] OpenBSD Deployment Failed #2990

Open renygma opened 8 months ago

renygma commented 8 months ago

Current OpenBSD Versions (7.3/7.4) cannot be properly deployed on Azure with the WALinuxAgent. The deployment times out with the following message:

{
"status":"Failed",
"error":{
  "code":"DeploymentFailed",
  "target":"/subscriptions/XXXXXXXXXX/resourceGroups/rg1/providers/Microsoft.Resources/deployments/vm",
  "message":"At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-deployment-operations for usage details.",
  "details":[{
    "code":"ResourceDeploymentFailure",
    "target":"/subscriptions/XXXXXXXXXX/resourceGroups/rg1/providers/Microsoft.Compute/virtualMachines/vm",
    "message":"The resource write operation failed to complete successfully, because it reached terminal provisioning state 'Failed'.",
    "details":[{
      "code":"OSProvisioningTimedOut",
      "message":"OS Provisioning for VM 'vm' did not finish in the allotted time. The VM may still finish provisioning successfully. Please check provisioning state later. Also, make sure the image has been properly prepared (generalized).\r\n * Instructions for Windows: https://azure.microsoft.com/documentation/articles/virtual-machines-windows-upload-image/ \r\n * Instructions for Linux: https://azure.microsoft.com/documentation/articles/virtual-machines-linux-capture-image/ \r\n * If you are deploying more than 20 Virtual Machines concurrently, consider moving your custom image to shared image gallery. Please refer to https://aka.ms/movetosig for the same."
      }]
    }]
  }
}

Checking the waagent.log, it seems the agent is running in an endless loop, even after the Azure deployment failed. waagent.log:

2023-11-30T12:51:14.333651Z INFO Daemon Daemon Send dhcp request
2023-11-30T12:51:24.427377Z WARNING Daemon Daemon Failed to send DHCP request: [DhcpError] timed out
2023-11-30T12:51:34.447304Z WARNING Daemon Daemon Failed to send DHCP request: [DhcpError] timed out
2023-11-30T12:51:54.467267Z WARNING Daemon Daemon Failed to send DHCP request: [DhcpError] timed out
2023-11-30T12:52:34.487275Z WARNING Daemon Daemon Failed to send DHCP request: [DhcpError] timed out
2023-11-30T12:53:44.507271Z WARNING Daemon Daemon Failed to send DHCP request: [DhcpError] timed out
2023-11-30T12:54:44.517273Z INFO Daemon Daemon Protocol endpoint not found: [ProtocolError] [DhcpError] Failed to receive dhcp response.
2023-11-30T12:54:44.526852Z INFO Daemon Daemon Retry detect protocol: retry=32
2023-11-30T12:54:54.537216Z INFO Daemon Daemon WireServer endpoint is not found. Rerun dhcp handler
2023-11-30T12:54:54.544796Z INFO Daemon Daemon Test for route to XXX.XXX.XXX.XXX
2023-11-30T12:54:54.550981Z ERROR Daemon Daemon Cannot read route table [[Errno 2] No such file or directory: '/proc/net/route']
2023-11-30T12:54:54.559909Z WARNING Daemon Daemon No route exists to XXX.XXX.XXX.XXX
2023-11-30T12:54:54.566745Z INFO Daemon Daemon Checking for dhcp lease cache
2023-11-30T12:54:54.597030Z INFO Daemon Daemon looking for leases in path [/var/db/dhclient.leases.hvn0]
2023-11-30T12:54:54.605167Z INFO Daemon Daemon cached endpoint not found
2023-11-30T12:54:54.610900Z INFO Daemon Daemon Cache exists [False]
2023-11-30T12:54:54.616501Z INFO Daemon Daemon Send dhcp request
2023-11-30T12:55:04.707423Z WARNING Daemon Daemon Failed to send DHCP request: [DhcpError] timed out
2023-11-30T12:55:14.727258Z WARNING Daemon Daemon Failed to send DHCP request: [DhcpError] timed out
2023-11-30T12:55:34.747239Z WARNING Daemon Daemon Failed to send DHCP request: [DhcpError] timed out
2023-11-30T12:56:14.777238Z WARNING Daemon Daemon Failed to send DHCP request: [DhcpError] timed out

Although the log throws a DhcpError, the VM does have the correct IP assigned and is reachable/accessible.

When trying to run the reboot or shutdown action via azure portal/cli, the commands time out with the same behaviour as the deployment. The VM does still reboot/shutdown.

Additionally i noticed that the host is still called "localhost" after the deployment.

I used this guide for the creation of the OpenBSD Image: https://learn.microsoft.com/en-us/azure/virtual-machines/linux/create-upload-openbsd I used python3 instead of version 2 and the installation and service-registering worked just fine.

Tested with OpenBSD 7.3 and 7.4. Waagent version:

WALinuxAgent-2.9.1.1 running on openbsd 7.4
Python: 3.11.5
Goal state agent: 2.9.1.1
narrieta commented 8 months ago

@renygma - Apologies for the late reply.

We provide very limited support outside the distros endorsed by Azure: https://learn.microsoft.com/en-us/azure/virtual-machines/linux/endorsed-distros

The instructions on the link you pointed out were apparently tested on OpenBSD 6.1

The errors in the log you posted would need to be debugged manually, I do not see anything obvious in these messages.

I'll leave this issue open to keep track of this issue.